Python read csv data file

How to Read a CSV File in Python

A CSV (Comma Separated Values) file is a file with values seperated by a comma. Its often used to import and export with databases and spreadsheets.

Values are mostly seperated by comma. Sometimes another character is used like a semicolon, the seperation character is called a delimiter.

What is a CSV file?

A CSV file is a type of plain text file that contains values that are separated by a delimiter. The .csv extension is short for comma separated value , because the delimter is often a comma.

Excel can open CSV files. Web Apps allow export of data into a CSV file.

By default a csv file has rows and columns, as it’s a representation of tabular data. You can see every value as a cell and every line as a row.

CSV Sample File

You can represent a table in a CSV (comma separated values) file. The text is the tabular data. Each line of the csv file is a line inside the table, where every value (seperated by comma) is a delimiter (a comma, a semicolon or another delimiter*).

Читайте также:  Как приравнять массивы питон

Because CSV files are plain text data, almost all programming languages support it. You can easily parse it in Python.

You could have this table:

Name Age Salary
Chris 20 $3600
Harry 25 $3200
Barry 30 $3000

And represent the same data as a .csv file.

Chris,20,3600 Harry,25,3200 Barry,30,3000 

Here each row in the file matches a row in the table, and each value is a cell in the table.

Read CSV

In Python, there are two common ways to read csv files:

  • read csv with the csv module
  • read csv with the pandas module (see bottom)

Python CSV Module

Python comes with a module to parse csv files, the csv module. You can use this module to read and write data, without having to do string operations and the like.

Read a CSV File

Lets get into how to read a csv file. You can use the csv module. The module is already installed, just import it with import csv .

Then you’ll want to open the csv file, you can with:

with open(‘office.csv’) as csvfile:

Then create a reader object csv.reader() where the parameters are the filename and the delimiter.

This sounds hard, but it is as simple as:

csvReader = csv.reader(csvfile, delimiter=‘,’)

Then you can loop over the rows and parse them or display them.

import csv

with open(‘office.csv’) as csvfile:
csvReader = csv.reader(csvfile, delimiter=‘,’)
for row in csvReader:
print(row)

When you run the program, it will show you every row as a list

➜ ~ python3 program.py ['Chris', '20', '3600'] ['Harry', '25', '3200'] ['Barry', '30', '3000'] 

Because it is a list, you can access cells using square brackets.
The first cell is row[0] , the second cell row[1] etcetera.

for row in csvReader:
print(row[0])
print(row[1])

Read a CSV as a Dict

If you want to read the data into a dictionary instead of a list, you can do that.
The csv module comes with a DictReader . This lets you read a csv file as dictionary.

If you want to read it as a dictionary, make sure to include a header because that will be included in the key-value mapping.

name,age,salary Chris,20,3600 Harry,25,3200 Barry,30,3000 

Then your program can read the csv with csv.DictReader(open(filename))

import csv

csvReader = csv.DictReader(open(«students.csv»))
for row in csvReader:
print(row)

The program then outputs dictionaries:

OrderedDict([('name', 'Chris'), ('age', '20'), ('salary', '3600')]) OrderedDict([('name', 'Harry'), ('age', '25'), ('salary', '3200')]) OrderedDict([('name', 'Barry'), ('age', '30'), ('salary', '3000')]) 

An OrderedDict functions exactly the same as normal dict .

CSV Module Functions

The csv module comes with many different functions:

  • csv.field_size_limit – return maximum field size
  • csv.get_dialect – get the dialect which is associated with the name
  • csv.list_dialects – show all registered dialects
  • csv.reader – read data from a csv file
  • csv.register_dialect — associate dialect with name
  • csv.writer – write data to a csv file
  • csv.unregister_dialect — delete the dialect associated with the name the dialect registry
  • csv.QUOTE_ALL — Quote everything, regardless of type.
  • csv.QUOTE_MINIMAL — Quote fields with special characters
  • csv.QUOTE_NONNUMERIC — Quote all fields that aren’t numbers value
  • csv.QUOTE_NONE – Don’t quote anything in output

This article focuses only on the csv.reader , that lets you read a file.

Read CSV with Pandas

python pandas read excel

Pandas is a data analysis library. It often used in data science. If you work with data a lot, using the pandas module is way better.

First we start with some data. Lets say you have a csv file containing nation statistics, nations.csv :

Country,Capital,Language,Currency
United States, Washington, English, US dollar
Canada, Ottawa, English and French, Canadian dollar
Germany, Berlin, German, Euro

By default, the pandas module is not installed. You can install it with Python package manager pip. After installation, load it like this:

Pandas has a function read csv files, .read_csv(filename) .

This loads the csv file into a Pandas data frame.

Pandas works with dataframes which hold all data. Data frames are really cool data structures, they let you grab an entire row at once, by using it’s header name. (The header was the first line in the csv file)

Example

The program below reads a csv file with pandas.:

import pandas as pd
import numpy as np

df = pd.read_csv(‘nations.csv’)
print(df)
print(‘\n’)

for country in df[‘Country’]:
print(country)

This outputs both the dataframe print(df) and a row df[‘Country’]

➜ ~ python3 sample.py Country Capital Language Currency 0 United States Washington English US dollar 1 Canada Ottawa English and French Canadian dollar 2 Germany Berlin German Euro United States Canada Germany 

You can iterate row by row like this:

import pandas as pd

df = pd.read_csv(‘nations.csv’)
for index, row in df.iterrows():
print(row[‘Country’], row[‘Capital’],row[‘Language’])

Delimiter

If you have another delimter than the default comma, say a pipe then you an use the parameter sep = .

import pandas as pd
df = pd.read_csv(«data.csv», sep=«|»)

If your csv file does not include a header, you can either remove it from the file or change the program to have the parameter header = None .

import pandas as pd
df = pd.read_csv(«data.csv», header = None)

If it’s on another line (say the 2nd line) you can use:

import pandas as pd
df = pd.read_csv(«data.csv», header = 1)

pandas names

If your csv data does not have a header, don’t worry. You can define the columns in code while opening the file:

import pandas as pd
df = pd.read_csv(‘nations.csv’, names=[‘Country’,‘Capital’,‘Language’,‘Currency’])

pandas use_cols

If you only want to load specific columns, you can specify the parameter usecols .
This is useful if you have a large csv with a lot of columns. You can define one ore more columns:

import pandas as pd
df = pd.read_csv(«nations.csv», usecols = [«Country», «Currency»])

Источник

How To Read A CSV File In Python

Kelly Moreira %

I first began to work with CSV files when taking the backend portion of my software engineering bootcamp curriculum. It wasn’t until I began to dive more into the data science portion of my continued learning that I began to use them on a regular basis.

CSV stands for comma-separated values, and files containing the .csv extension contain a collection of comma-separated values used to store data.

In this tutorial we will be using the public Beach Water Quality data set stored in the bwq.csv file. You can obtain the file by downloading it from Kaggle, however, you should be able to read any csv file following the instructions below.

Read A CSV File Using Python

There are two common ways to read a .csv file when using Python. The first by using the csv library, and the second by using the pandas library.

1. Using the CSV Library

 import csv with open("./bwq.csv", 'r') as file: csvreader = csv.reader(file) for row in csvreader: print(row)

Here we are importing the csv library in order to use the .reader() method it contains to help us read the csv file.

The with keyword allows us to both open and close the file without having to explicitly close it.

The open() method takes two arguments of type string . First the file name, and second a mode argument. We are using r for read, however this can be omitted as r is assumed by default.

We then iterate over all the rows.

You should expect an output in the terminal to look something like this:

2. Using the Pandas Library

 import pandas as pd data = pd.read_csv("bwq.csv") data

Here we’re importing Pandas, a Python library used to conduct data manipulation and analysis. It contains the .read_csv() method we need in order to read our csv file.

You should expect the output to look something like this:

Possible Delimiters Issues

The majority of csv files are separated by commas, however, there are some that are separated by other characters, like colons for example, which can output strange results in Python.

Solution For Delimiters Using the CSV Library

To change the delimiter using the csv library, simply pass in the delimiter= ‘:’ argument in the reader() method like so:

 import csv with open("./fileWithColonDelimeter.csv", 'r') as file: csvreader = csv.reader(file, delimiter=':') for row in csvreader: print(row)

For other edge cases in reading csv files using the csv library, check out this page in the Python docs.

Solution For Delimiters Using the Pandas Library

To change the delimiter using the pandas library, simply pass in the argument delimiter= ‘:’ in the read_csv() method like so:

 import pandas as pd data = pd.read_csv("fileWithColonDelimeter.csv", delimiter= ':') data

For other edge cases in reading csv files using the Pandas library check out this page the Pandas docs.

Up Next

Better Dependency Management in Python is a great introduction to using Earthly with Python and if you want to bring your CI to the next level, check out Earthly’s open source build tool.

Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.

Источник

Оцените статью