- How to Read a CSV File in Python
- What is a CSV file?
- CSV Sample File
- Read CSV
- Python CSV Module
- Read a CSV File
- Read a CSV as a Dict
- CSV Module Functions
- Read CSV with Pandas
- Example
- Delimiter
- Header
- pandas names
- pandas use_cols
- How To Read A CSV File In Python
- Read A CSV File Using Python
- 1. Using the CSV Library
- 2. Using the Pandas Library
- Possible Delimiters Issues
- Solution For Delimiters Using the CSV Library
- Solution For Delimiters Using the Pandas Library
- Up Next
How to Read a CSV File in Python
A CSV (Comma Separated Values) file is a file with values seperated by a comma. Its often used to import and export with databases and spreadsheets.
Values are mostly seperated by comma. Sometimes another character is used like a semicolon, the seperation character is called a delimiter.
What is a CSV file?
A CSV file is a type of plain text file that contains values that are separated by a delimiter. The .csv extension is short for comma separated value , because the delimter is often a comma.
Excel can open CSV files. Web Apps allow export of data into a CSV file.
By default a csv file has rows and columns, as it’s a representation of tabular data. You can see every value as a cell and every line as a row.
CSV Sample File
You can represent a table in a CSV (comma separated values) file. The text is the tabular data. Each line of the csv file is a line inside the table, where every value (seperated by comma) is a delimiter (a comma, a semicolon or another delimiter*).
Because CSV files are plain text data, almost all programming languages support it. You can easily parse it in Python.
You could have this table:
Name | Age | Salary |
---|---|---|
Chris | 20 | $3600 |
Harry | 25 | $3200 |
Barry | 30 | $3000 |
And represent the same data as a .csv file.
Chris,20,3600 Harry,25,3200 Barry,30,3000
Here each row in the file matches a row in the table, and each value is a cell in the table.
Read CSV
In Python, there are two common ways to read csv files:
- read csv with the csv module
- read csv with the pandas module (see bottom)
Python CSV Module
Python comes with a module to parse csv files, the csv module. You can use this module to read and write data, without having to do string operations and the like.
Read a CSV File
Lets get into how to read a csv file. You can use the csv module. The module is already installed, just import it with import csv .
Then you’ll want to open the csv file, you can with:
with open(‘office.csv’) as csvfile:
Then create a reader object csv.reader() where the parameters are the filename and the delimiter.
This sounds hard, but it is as simple as:
csvReader = csv.reader(csvfile, delimiter=‘,’)
Then you can loop over the rows and parse them or display them.
import csv
with open(‘office.csv’) as csvfile:
csvReader = csv.reader(csvfile, delimiter=‘,’)
for row in csvReader:
print(row)
When you run the program, it will show you every row as a list
➜ ~ python3 program.py ['Chris', '20', '3600'] ['Harry', '25', '3200'] ['Barry', '30', '3000']
Because it is a list, you can access cells using square brackets.
The first cell is row[0] , the second cell row[1] etcetera.
for row in csvReader:
print(row[0])
print(row[1])
Read a CSV as a Dict
If you want to read the data into a dictionary instead of a list, you can do that.
The csv module comes with a DictReader . This lets you read a csv file as dictionary.
If you want to read it as a dictionary, make sure to include a header because that will be included in the key-value mapping.
name,age,salary Chris,20,3600 Harry,25,3200 Barry,30,3000
Then your program can read the csv with csv.DictReader(open(filename))
import csv
csvReader = csv.DictReader(open(«students.csv»))
for row in csvReader:
print(row)
The program then outputs dictionaries:
OrderedDict([('name', 'Chris'), ('age', '20'), ('salary', '3600')]) OrderedDict([('name', 'Harry'), ('age', '25'), ('salary', '3200')]) OrderedDict([('name', 'Barry'), ('age', '30'), ('salary', '3000')])
An OrderedDict functions exactly the same as normal dict .
CSV Module Functions
The csv module comes with many different functions:
- csv.field_size_limit – return maximum field size
- csv.get_dialect – get the dialect which is associated with the name
- csv.list_dialects – show all registered dialects
- csv.reader – read data from a csv file
- csv.register_dialect — associate dialect with name
- csv.writer – write data to a csv file
- csv.unregister_dialect — delete the dialect associated with the name the dialect registry
- csv.QUOTE_ALL — Quote everything, regardless of type.
- csv.QUOTE_MINIMAL — Quote fields with special characters
- csv.QUOTE_NONNUMERIC — Quote all fields that aren’t numbers value
- csv.QUOTE_NONE – Don’t quote anything in output
This article focuses only on the csv.reader , that lets you read a file.
Read CSV with Pandas
Pandas is a data analysis library. It often used in data science. If you work with data a lot, using the pandas module is way better.
First we start with some data. Lets say you have a csv file containing nation statistics, nations.csv :
Country,Capital,Language,Currency
United States, Washington, English, US dollar
Canada, Ottawa, English and French, Canadian dollar
Germany, Berlin, German, Euro
By default, the pandas module is not installed. You can install it with Python package manager pip. After installation, load it like this:
Pandas has a function read csv files, .read_csv(filename) .
This loads the csv file into a Pandas data frame.
Pandas works with dataframes which hold all data. Data frames are really cool data structures, they let you grab an entire row at once, by using it’s header name. (The header was the first line in the csv file)
Example
The program below reads a csv file with pandas.:
import pandas as pd
import numpy as np
df = pd.read_csv(‘nations.csv’)
print(df)
print(‘\n’)
for country in df[‘Country’]:
print(country)
This outputs both the dataframe print(df) and a row df[‘Country’]
➜ ~ python3 sample.py Country Capital Language Currency 0 United States Washington English US dollar 1 Canada Ottawa English and French Canadian dollar 2 Germany Berlin German Euro United States Canada Germany
You can iterate row by row like this:
import pandas as pd
df = pd.read_csv(‘nations.csv’)
for index, row in df.iterrows():
print(row[‘Country’], row[‘Capital’],row[‘Language’])
Delimiter
If you have another delimter than the default comma, say a pipe then you an use the parameter sep = .
import pandas as pd
df = pd.read_csv(«data.csv», sep=«|»)
Header
If your csv file does not include a header, you can either remove it from the file or change the program to have the parameter header = None .
import pandas as pd
df = pd.read_csv(«data.csv», header = None)
If it’s on another line (say the 2nd line) you can use:
import pandas as pd
df = pd.read_csv(«data.csv», header = 1)
pandas names
If your csv data does not have a header, don’t worry. You can define the columns in code while opening the file:
import pandas as pd
df = pd.read_csv(‘nations.csv’, names=[‘Country’,‘Capital’,‘Language’,‘Currency’])
pandas use_cols
If you only want to load specific columns, you can specify the parameter usecols .
This is useful if you have a large csv with a lot of columns. You can define one ore more columns:
import pandas as pd
df = pd.read_csv(«nations.csv», usecols = [«Country», «Currency»])
How To Read A CSV File In Python
I first began to work with CSV files when taking the backend portion of my software engineering bootcamp curriculum. It wasn’t until I began to dive more into the data science portion of my continued learning that I began to use them on a regular basis.
CSV stands for comma-separated values, and files containing the .csv extension contain a collection of comma-separated values used to store data.
In this tutorial we will be using the public Beach Water Quality data set stored in the bwq.csv file. You can obtain the file by downloading it from Kaggle, however, you should be able to read any csv file following the instructions below.
Read A CSV File Using Python
There are two common ways to read a .csv file when using Python. The first by using the csv library, and the second by using the pandas library.
1. Using the CSV Library
import csv with open("./bwq.csv", 'r') as file: csvreader = csv.reader(file) for row in csvreader: print(row)
Here we are importing the csv library in order to use the .reader() method it contains to help us read the csv file.
The with keyword allows us to both open and close the file without having to explicitly close it.
The open() method takes two arguments of type string . First the file name, and second a mode argument. We are using r for read, however this can be omitted as r is assumed by default.
We then iterate over all the rows.
You should expect an output in the terminal to look something like this:
2. Using the Pandas Library
import pandas as pd data = pd.read_csv("bwq.csv") data
Here we’re importing Pandas, a Python library used to conduct data manipulation and analysis. It contains the .read_csv() method we need in order to read our csv file.
You should expect the output to look something like this:
Possible Delimiters Issues
The majority of csv files are separated by commas, however, there are some that are separated by other characters, like colons for example, which can output strange results in Python.
Solution For Delimiters Using the CSV Library
To change the delimiter using the csv library, simply pass in the delimiter= ‘:’ argument in the reader() method like so:
import csv with open("./fileWithColonDelimeter.csv", 'r') as file: csvreader = csv.reader(file, delimiter=':') for row in csvreader: print(row)
For other edge cases in reading csv files using the csv library, check out this page in the Python docs.
Solution For Delimiters Using the Pandas Library
To change the delimiter using the pandas library, simply pass in the argument delimiter= ‘:’ in the read_csv() method like so:
import pandas as pd data = pd.read_csv("fileWithColonDelimeter.csv", delimiter= ':') data
For other edge cases in reading csv files using the Pandas library check out this page the Pandas docs.
Up Next
Better Dependency Management in Python is a great introduction to using Earthly with Python and if you want to bring your CI to the next level, check out Earthly’s open source build tool.
Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.