- How to Read CSV Files Python
- What Are CSV Files?
- Why Are CSV Files So Common?
- Using Python’s csv Module
- CSV Dialects and How to Deal With Them
- What If We Don’t Know the CSV Dialect?
- Now That You Know About CSV Files and Python .
- How To Read A CSV File In Python
- Read A CSV File Using Python
- 1. Using the CSV Library
- 2. Using the Pandas Library
- Possible Delimiters Issues
- Solution For Delimiters Using the CSV Library
- Solution For Delimiters Using the Pandas Library
- Up Next
How to Read CSV Files Python
If you are working as a back-end developer or data scientist, chances are that you’ve already dealt with CSV files. It is one of the most used formats for working with and transferring data. Many Python libraries can handle CSVs, but in this article, we’ll focus on Python’s csv module.
What Are CSV Files?
A CSV file, also known as a comma-separated values file, is a text file that contains data records. Each line represents a different record and includes one or more fields. These fields represent different data values.
Let’s look at some CSV examples. Below we have a snippet of a CSV file containing student data:
firstname,lastname,class Benjamin,Berman,2020 Sophie,Case,2018
The first line is the header, which is essentially column names. Each line will have the same number of fields as the first line has column names. We’re using commas as delimiters (i.e. to separate fields in a line).
Let’s look at a second example:
firstname|lastname|class Benjamin|Berman|2020 Sophie|Case|2018
This snippet has the same structure as the first one. The difference is the delimiter: we’re using a vertical bar. As long as we know the general structure of the CSV file, we can deal with it.
Why Are CSV Files So Common?
In essence, CSV files are plain-text files, meaning they are as simple as it gets. This simplicity makes it easy to create, modify, and transfer them – regardless of the platform. Thus, tabular data (i.e. data structured as rows, where each row describes one item) can be moved between programs or systems that otherwise might be incompatible.
Another benefit of this simplicity is that it’s very easy to import this data into spreadsheets and databases. For spreadsheets, just opening the CSV file often automatically imports the data into the spreadsheet program.
One of the most common uses of CSV files is when part of a database’s data needs to be extracted for use by a non-technical coworker. Most modern database systems allow users to export their data into CSV files. Instead of making non-technical people struggle through the database system, we can easily give them a CSV file with the data they need. We could also easily extract a CSV file from a spreadsheet and insert that into our database. This makes interfacing between non-technical personnel and databases a lot easier.
At times, we might work on actual CSV files – e.g. when one team scrapes data and delivers it to the team that is supposed to work with it. The most common way to deliver the data would be in a CSV file. Or perhaps we need to get some data from a legacy system that we can’t interface with. The easiest solution is to acquire this data in CSV format, since textual data is easier to move from system to system.
Reading CSV files is so common that questions about it frequently appear in Python technical interviews. You can learn more about the questions you might face in a Python-focused data science job interview in this article. Even if you’re not interested in a data science role, check it out; you might run across some of these questions in other Python jobs.
Using Python’s csv Module
There are many Python modules that can read a CSV file, but there might be cases where we aren’t able to use those libraries, i.e. due to platform or development environment limitations. For that reason, we’ll focus on Python’s built-in csv module. Below we have a CSV file containing two students’ grades:
Name,Class,Lecture,Grade Benjamin,A,Mathematics,90 Benjamin,A,Chemistry,54 Benjamin,A,Physics,77 Sophie,B,Mathematics,90 Sophie,B,Chemistry,90 Sophie,B,Physics,90
This file includes six records. Each record contains a name, a class, a lecture, and a grade. Each field is separated by commas. To work with this file, we’ll use the csv.reader() function, which accepts an iterable object. In this case, we will be providing it with a file object. Here is the code to print all rows of the Report.csv file:
import csv with open("Report.csv", "r") as handler: reader = csv.reader(handler, delimiter=',') for row in reader: print(row)
Let’s analyze this code line by line. First, we import the CSV module that comes with the regular Python installation. Then we open the CSV file and create a file handler called handler . Since this file handler is an iterable object that returns a string whenever the __next__ method is called on it, we can give it as an argument in the reader() function and get a CSV handler that we call reader . And now we can iterate over reader; each element of it will be a list of fields for each line in our original CSV file.
Keep in mind that the CSV file can include field names as its first line. If we know that this is the case, we can use the csv.DictReader() function to create a handler. Instead of returning a list for each row, this function will return a dictionary for each line. The key for each dictionary is the names in the first line of the CSV file.
CSV Dialects and How to Deal With Them
Even though CSV stands for “comma separated values”, there is no set standard for these files. Thus, csv allows us to specify the CSV dialect. The csv.list_dialects() function lists the csv module’s built-in dialects. For me, these are excel , excel-tab , and unix .
The excel dialect is the default setting for CSV files exported directly from Microsoft Excel; its delimiter is a comma. A variant of this is excel-tab , where the delimiter is a tab. More info on these dialects can be seen on the Python GitHub page.
If your company or team is using a custom-styled CSV, you can create your own CSV dialect and put it into the system using the register_dialect() function. See the Python GitHub page for more details. An example would look as follows:
csv.register_dialect('myDialect',delimiter='|', skipinitialspace=True, quoting=csv.QUOTE_ALL)
You could then use the new myDialect to read a CSV file:
import csv with open("Report.csv","r") as handler: reader = csv.reader(handler, dialect="myDialect")
This works much like our previous example, but instead of supplying an argument for the delimiter, we simply give our new dialect as the argument.
Here we state that we are creating a dialect called “myDialect”. This dialect will use the vertical bar ( | ) as the delimiter. It also indicates that we want to skip any whitespaces (empty spaces) after delimiters and that all values are inside quotes. There are a few more parameters that can be set; see the links above for details.
What If We Don’t Know the CSV Dialect?
Sometimes we won’t know what dialect the CSV file has. For times like this, we can use the csv.Sniffer() functionality. I’ve found the two functions below very useful:
header_exists = csv.Sniffer().has_header(reader) sniffed_dialect = csv.Sniffer().sniff(reader)
The first function returns a Boolean value indicating if there is a header. The second function returns the dialect as found by csv.Sniffer() . It is always beneficial to use these functions when we don’t know the structure of the CSV file.
Now That You Know About CSV Files and Python .
… you need to practice! The CSV file format is one of the oldest and most common data transfer methods out there. We simply cannot hope to avoid it when working as a data scientist or machine learning engineer. Even back-end developers deal with CSV files, either when receiving data or when writing it back to the system for some other component to use.
As the csv module is already installed in Python, it’ll probably be your go-to tool for dealing with CSV files. For some hands-on practice in working with CSVs in Python, take a look at our interactive course How to Read and Write CSV Files in Python.
How To Read A CSV File In Python
I first began to work with CSV files when taking the backend portion of my software engineering bootcamp curriculum. It wasn’t until I began to dive more into the data science portion of my continued learning that I began to use them on a regular basis.
CSV stands for comma-separated values, and files containing the .csv extension contain a collection of comma-separated values used to store data.
In this tutorial we will be using the public Beach Water Quality data set stored in the bwq.csv file. You can obtain the file by downloading it from Kaggle, however, you should be able to read any csv file following the instructions below.
Read A CSV File Using Python
There are two common ways to read a .csv file when using Python. The first by using the csv library, and the second by using the pandas library.
1. Using the CSV Library
import csv with open("./bwq.csv", 'r') as file: csvreader = csv.reader(file) for row in csvreader: print(row)
Here we are importing the csv library in order to use the .reader() method it contains to help us read the csv file.
The with keyword allows us to both open and close the file without having to explicitly close it.
The open() method takes two arguments of type string . First the file name, and second a mode argument. We are using r for read, however this can be omitted as r is assumed by default.
We then iterate over all the rows.
You should expect an output in the terminal to look something like this:
2. Using the Pandas Library
import pandas as pd data = pd.read_csv("bwq.csv") data
Here we’re importing Pandas, a Python library used to conduct data manipulation and analysis. It contains the .read_csv() method we need in order to read our csv file.
You should expect the output to look something like this:
Possible Delimiters Issues
The majority of csv files are separated by commas, however, there are some that are separated by other characters, like colons for example, which can output strange results in Python.
Solution For Delimiters Using the CSV Library
To change the delimiter using the csv library, simply pass in the delimiter= ‘:’ argument in the reader() method like so:
import csv with open("./fileWithColonDelimeter.csv", 'r') as file: csvreader = csv.reader(file, delimiter=':') for row in csvreader: print(row)
For other edge cases in reading csv files using the csv library, check out this page in the Python docs.
Solution For Delimiters Using the Pandas Library
To change the delimiter using the pandas library, simply pass in the argument delimiter= ‘:’ in the read_csv() method like so:
import pandas as pd data = pd.read_csv("fileWithColonDelimeter.csv", delimiter= ':') data
For other edge cases in reading csv files using the Pandas library check out this page the Pandas docs.
Up Next
Better Dependency Management in Python is a great introduction to using Earthly with Python and if you want to bring your CI to the next level, check out Earthly’s open source build tool.
Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.