- Read CSV file into a NumPy Array in Python
- What is CSV File?
- Frequently Asked:
- Read CSV File into a NumPy Array using loadtxt()
- Read CSV File into a NumPy Array using genfromtxt()
- Read CSV File into a NumPy Array using read_csv()
- Read CSV File into a NumPy Array using file handling and fromstring()
- Related posts:
- Share your love
- Leave a Comment Cancel Reply
- Terms of Use
- Disclaimer
- Reading and writing files#
- Reading text and CSV files#
- With no missing values#
- With missing values#
- With non-whitespace delimiters#
- Whitespace-delimited#
- Read a file in .npy or .npz format#
- Write to a file to be read back by NumPy#
- Binary#
- Human-readable#
- Large arrays#
- Read an arbitrarily formatted binary file (“binary blob”)#
- Write or read large arrays#
- Write files for reading by other (non-NumPy) tools#
- Write or read a JSON file#
- Save/restore using a pickle file#
- Convert from a pandas DataFrame to a NumPy array#
- Save/restore using tofile and fromfile #
Read CSV file into a NumPy Array in Python
In this article, we will learn how to Read CSV file into a NumPy Array in Python.
Table Of Contents
What is CSV File?
A CSV file is a comma-separated values file. The csv file format allows data to be saved in a tabular format. The CSV file is just a plain text file in which data is separated by commas.
Suppose we have a csv file: data.csv and its contents are as follow,
Now we want to load this CSV file into a NumPy Array.
Frequently Asked:
There are multiple ways to Read CSV file into a NumPy Array in Python. Lets discuss all the methods one by one with proper approach and a working code example
Read CSV File into a NumPy Array using loadtxt()
The numpy module has a loadtxt() function and it is used to load data from a text file. Each row in the text file must have same number of values.
Syntax of loadtxt() function
numpy.loadtxt(fname, delimiter, skiprows)
- Parameters:
- fname = Name or path of the file to be loaded.
- delimiter = The string used to separate values, By default the delimiter is whitespace.
- skiprows = The Number of rows to be skipped.
- Returns an array.
- Import numpy library.
- Pass the path of the csv file and delimiter as comma (,) to loadtxt() method.
- Print the Array returned by the loadtxt() method.
Source Code
import numpy as np # Reading csv file into numpy array arr = np.loadtxt("data.csv", delimiter=",") # printing the array print(arr)
[[1. 2. 3. 4. 5.] [6. 7. 8. 9. 0.] [2. 3. 4. 5. 6.] [4. 5. 6. 7. 7.]]
Read CSV File into a NumPy Array using genfromtxt()
The numpy module have genfromtxt() function, and it is used to Load data from a text file. The genfromtxt() function can handle rows with missing values as specified.
Syntax of genfromtxt() function
numpy.genfromtxt(fname, delimiter)
- Parameters:
- fname = Name or path of the file to be loaded.
- delimiter = The string used to separate values, By default the delimiter is whitespace.
- Returns an array.
- Import numpy library.
- Pass the path of the csv file and delimiter as comma (,) to genfromtxt() method.
- Print the Array returned by the genfromtxt() method.
Source Code
import numpy as np # Reading csv file into numpy array arr = np.genfromtxt("data.csv",delimiter=",") # printing the array print(arr)
[[1. 2. 3. 4. 5.] [6. 7. 8. 9. 0.] [2. 3. 4. 5. 6.] [4. 5. 6. 7. 7.]]
Read CSV File into a NumPy Array using read_csv()
The pandas module has a read_csv() method, and it is used to Read a comma-separated values (csv) file into DataFrame, By using the values property of the dataframe we can get the numpy array.
Syntax of read_csv() function
numpy.read_csv(file_path, sep, header)
- Parameters:
- file_path = Name or path of the csv file to be loaded.
- sep = The string used to separate values i.e, delimiter , By default the delimiter is comma (,).
- header = The names of the columns.
- Returns an DataFrame.
- Import pandas and numpy library.
- Pass the path of the csv file and header as None to read_csv() method.
- Now use the values property of DataFrame to get numpy array from the dataframe.
- Print the numpy array.
Source Code
import numpy as np import pandas as pd # Reading csv file into numpy array arr = pd.read_csv('data.csv', header=None).values # printing the array print(arr)
[[1 2 3 4 5] [6 7 8 9 0] [2 3 4 5 6] [4 5 6 7 7]]
Read CSV File into a NumPy Array using file handling and fromstring()
Python supports file handling and provides various functions for reading, writing the files. The numpy module provides fromstring() method, and it is used to make a numpy array from a string. Now to convert a CSV file into numpy array, read the csv file using file handling. Then, for each row in the file, convert the row into numpy array using fromstring() method, and join all the arrays.
Syntax of read_csv() function
- Parameters:
- str = A string containing the data..
- sep = The string separating numbers in the data by default the sep is a whitespace.
- Returns an numpy array.
Syntax of open() function
- Parameters:
- file = Name or path of the file to be loaded.
- mode = This specifies the access mode of opening a file by default the mode is read mode.
- Returns an file object.
- Import numpy library.
- Open the csv file in read mode and read each row of the csv file.
- pass each row of the csv file and sep=”,” to the fromstring() method.
- the fromstring method will return a numpy array append it to a list
- Repeat step 3 and 4 till the last row of csv file.
- Convert the list into numpy array and print it.
Source Code
import numpy as np # Reading csv file into numpy array file_data = open('data.csv') l=[] for row in file_data: r = list(np.fromstring(row, sep=",")) l.append(r) # printing the array print(np.array(l))
[[1. 2. 3. 4. 5.] [6. 7. 8. 9. 0.] [2. 3. 4. 5. 6.] [4. 5. 6. 7. 7.]]
Great! you made it, We have discussed All possible methods to Read CSV file into a NumPy Array in Python. Happy learning.
Related posts:
Share your love
Leave a Comment Cancel Reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Terms of Use
Disclaimer
Copyright © 2023 thisPointer
To provide the best experiences, we and our partners use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us and our partners to process personal data such as browsing behavior or unique IDs on this site and show (non-) personalized ads. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Click below to consent to the above or make granular choices. Your choices will be applied to this site only. You can change your settings at any time, including withdrawing your consent, by using the toggles on the Cookie Policy, or by clicking on the manage consent button at the bottom of the screen.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Reading and writing files#
This page tackles common applications; for the full collection of I/O routines, see Input and output .
Reading text and CSV files#
With no missing values#
With missing values#
- return a masked arraymasking out missing values (if usemask=True ), or
- fill in the missing value with the value specified in filling_values (default is np.nan for float, -1 for int).
With non-whitespace delimiters#
>>> with open("csv.txt", "r") as f: . print(f.read()) 1, 2, 3 4,, 6 7, 8, 9
Masked-array output#
>>> np.genfromtxt("csv.txt", delimiter=",", usemask=True) masked_array( data=[[1.0, 2.0, 3.0], [4.0, --, 6.0], [7.0, 8.0, 9.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20)
Array output#
>>> np.genfromtxt("csv.txt", delimiter=",") array([[ 1., 2., 3.], [ 4., nan, 6.], [ 7., 8., 9.]])
Array output, specified fill-in value#
>>> np.genfromtxt("csv.txt", delimiter=",", dtype=np.int8, filling_values=99) array([[ 1, 2, 3], [ 4, 99, 6], [ 7, 8, 9]], dtype=int8)
Whitespace-delimited#
numpy.genfromtxt can also parse whitespace-delimited data files that have missing values if
- Each field has a fixed width: Use the width as the delimiter argument.
# File with width=4. The data does not have to be justified (for example, # the 2 in row 1), the last column can be less than width (for example, the 6 # in row 2), and no delimiting character is required (for instance 8888 and 9 # in row 3)
>>> with open("fixedwidth.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 88889
>>> np.genfromtxt("fixedwidth.txt", delimiter=4) array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]])
>>> with open("nan.txt", "r") as f: . print(f.read()) 1 2 3 44 x 6 7 8888 9
>>> np.genfromtxt("nan.txt", missing_values="x") array([[1.000e+00, 2.000e+00, 3.000e+00], [4.400e+01, nan, 6.000e+00], [7.000e+00, 8.888e+03, 9.000e+00]])
>>> with open("skip.txt", "r") as f: . print(f.read()) 1 2 3 44 6 7 888 9
>>> np.genfromtxt("skip.txt", invalid_raise=False) __main__:1: ConversionWarning: Some errors were detected ! Line #2 (got 2 columns instead of 3) array([[ 1., 2., 3.], [ 7., 888., 9.]])
>>> with open("tabs.txt", "r") as f: . data = (f.read()) >>> print(data) 1 2 3 44 6 7 888 9
>>> np.genfromtxt("tabs.txt", delimiter="\t", missing_values=" +") array([[ 1., 2., 3.], [ 44., nan, 6.], [ 7., 888., 9.]])
Read a file in .npy or .npz format#
Write to a file to be read back by NumPy#
Binary#
For security and portability , set allow_pickle=False unless the dtype contains Python objects, which requires pickling.
Masked arrays can’t currently be saved , nor can other arbitrary array subclasses.
Human-readable#
numpy.save and numpy.savez create binary files. To write a human-readable file, use numpy.savetxt . The array can only be 1- or 2-dimensional, and there’s no ` savetxtz` for multiple files.
Large arrays#
Read an arbitrarily formatted binary file (“binary blob”)#
The .wav file header is a 44-byte block preceding data_size bytes of the actual sound data:
chunk_id "RIFF" chunk_size 4-byte unsigned little-endian integer format "WAVE" fmt_id "fmt " fmt_size 4-byte unsigned little-endian integer audio_fmt 2-byte unsigned little-endian integer num_channels 2-byte unsigned little-endian integer sample_rate 4-byte unsigned little-endian integer byte_rate 4-byte unsigned little-endian integer block_align 2-byte unsigned little-endian integer bits_per_sample 2-byte unsigned little-endian integer data_id "data" data_size 4-byte unsigned little-endian integer
The .wav file header as a NumPy structured dtype:
wav_header_dtype = np.dtype([ ("chunk_id", (bytes, 4)), # flexible-sized scalar type, item size 4 ("chunk_size", "), # little-endian unsigned 32-bit integer ("format", "S4"), # 4-byte string, alternate spelling of (bytes, 4) ("fmt_id", "S4"), ("fmt_size", "), ("audio_fmt", "), # ("num_channels", "), # .. more of the same . ("sample_rate", "), # ("byte_rate", "), ("block_align", "), ("bits_per_sample", "), ("data_id", "S4"), ("data_size", "), # # the sound data itself cannot be represented here: # it does not have a fixed size ]) header = np.fromfile(f, dtype=wave_header_dtype, count=1)[0]
This .wav example is for illustration; to read a .wav file in real life, use Python’s built-in module wave .
(Adapted from Pauli Virtanen, Advanced NumPy , licensed under CC BY 4.0.)
Write or read large arrays#
Arrays too large to fit in memory can be treated like ordinary in-memory arrays using memory mapping.
array = numpy.memmap("mydata/myarray.arr", mode="r", dtype=np.int16, shape=(1024, 1024))
large_array[some_slice] = np.load("path/to/small_array", mmap_mode="r")
Memory mapping lacks features like data chunking and compression; more full-featured formats and libraries usable with NumPy include:
For tradeoffs among memmap, Zarr, and HDF5, see pythonspeed.com.
Write files for reading by other (non-NumPy) tools#
Formats for exchanging data with other tools include HDF5, Zarr, and NetCDF (see Write or read large arrays ).
Write or read a JSON file#
NumPy arrays are not directly JSON serializable.
Save/restore using a pickle file#
Avoid when possible; pickles are not secure against erroneous or maliciously constructed data.
Use numpy.save and numpy.load . Set allow_pickle=False , unless the array dtype includes Python objects, in which case pickling is required.
Convert from a pandas DataFrame to a NumPy array#
Save/restore using tofile and fromfile #
numpy.ndarray.tofile and numpy.fromfile lose information on endianness and precision and so are unsuitable for anything but scratch storage.
How to write a NumPy how-to