Python csv reader skip header

Skip Header Row When Processing CSV in Python

Two common libraries to load CSV files into Python are pandas and the inbuilt csv module. In this article, we will cover how to skip the header row of the CSV file when reading it using either of the said libraries. We will look at three methods to do this:

  1. The next() function when loading CSV data using csv.reader(),
  2. csv.DictReader(),
  3. Pandas skiprows parameter on pandas.read_csv() function

Example

In our examples in this article, we will use the “marks.csv” file, which contains the following content:

mark1,mark2,mark3,gender 12,15,15,M 15,15,16,M 13,13,12,F 10,9,8, 6,9,8,F 12,12,11,F 15,16,15,F

The file, “marks.csv,” is comma-delimited, and the first row contains the headers.

Method 1: Using the next() Function with csv.reader()

The next(iterator, [default]) function retrieves the next item from the iterator. The default value is returned if there’s no next element in the given iterator. Let’s see how to use this function with the csv.reader() to skip the header row.

Output (truncated):

['12', '15', '15', 'M'] ['15', '15', '16', 'M'] ['13', '13', '12', 'F'] ['10', '9', '8', ''] ['6', '9', '8', 'F'] ['12', '12', '11', 'F'] ['15', '16', '15', 'F']

When we use the next() function on the csv reader object immediately after creating the reader, we get the header values. And since we do not want to use the header row, we do nothing with the values returned by the next() function (note that we did not even assign a variable).

Читайте также:  METANIT.COM

Method 2: Using csv.DictReader() Instead of csv.reader()

The csv.DictReader(file, fieldnames=None, *args, **kwargs) operates like a regular csv.reader() but maps the information in each row to a dictionary whose keys are given by the optional fieldnames parameter. If no fieldnames are explicitly defined, the first row of the CSV file will be used (what we want ideally). Here is an example,

Output (truncated):

12 15 15 M 15 15 16 M 13 13 12 F 10 9 8 6 9 8 F 12 12 11 F 15 16 15 F

Method 3: Using skiprows Parameter in pandas.read_csv()

When reading a CSV file in pandas, you can choose to skip some rows using the skiprows argument. You can issue an integer value to determine the number of lines to be skipped or a list of numbers for the indices of the rows to be skipped.

This method is particularly useful for cases where the header row is not the first line of the CSV file. The contents of “marks2.csv” shown below depict such a scenario.

0,1,2,3 mark1,mark2,mark3,gender 12,15,15,M 15,15,16,M 13,13,12,F 10,9,8, 6,9,8,F 12,12,11,F 15,16,15,F

In such a case, pandas.read_csv() function will use the values on the first row as the column values. That is not what we want (ideally). We want to use the second line as the header row. In that case, we can issue the skiprows=1 to skip the first row as follows.

Output (formatted and truncated for better viewing):

If you want to skip the header row on the “marks.csv” file, you can still use the skiprows argument. However, pandas will use the next line as the columns. Let’s see it as an example.

# skiprows=1 skips the first row. You can also use skiprows=[0] to skip row at index 0 (first row of the data).

Output (truncated and formatted for a better view):

As earlier said, pandas will assign the next row as the columns after skipping the first row. If this is not what you want, you can cast the DataFrame into another data structure. In the following snippets, we convert the original pandas DataFrame into a list of lists and a list of dictionaries, respectively.

Output (truncated):

[[12, 15, 15, 'M'], [15, 15, 16, 'M'], . [15, 16, 15, 'F']] [, , …, ]

In the first case, we skip the original header row by casting the values of our Dataframe, df, into a list of lists (this is pretty similar to what we did in Method 1). Converting the DataFrame into a list of dictionaries matches what we did in Method 2. The header values are only used as the keys for accessing the values.

Conclusion

We have discussed skipping the header rows when processing CSV using either csv or pandas libraries. We saw that the next() function could effectively skip the header row when using a csv.reader().

Alternatively, csv.DictReader() can be used to read CSV files as a dictionary and use the header values as keys. Lastly, we discussed how to use the skiprows parameter in pandas.read_csv() to skip some rows.

We further noticed that pandas will still assign the first row of the resulting data as the column after skipping some rows. For that reason, we discussed how to cast a pandas DataFrame into a list or dictionaries to skip the header row.

Источник

Python csv skip header row

In this article, we will learn how we can remove the header of the CSV file data while reading the CSV itself because sometimes we don’t need the header of the CSV file data. So we are going to learn these four methods, which are given below:

Let’s explain each of the above methods in detail.

Method 1: Using next () method

In this method, we will use the next () method and see how this method will discard the header row before we print all the other csv data.

CSV File: The below csv file (test.csv) we will be using for this blog.

with open ( «test.csv» , «r» ) as record:
# We are creating an object of the csv reader
csvreader_object = csv . reader ( record )
# The line will skip the first row of the csv file (Header row)
next ( csvreader_object )

# We are now printing all rows except the first row of the csv
for row in csvreader_object:
print ( row )

Line 1: We import the CSV module.

Line 3 -7: We open the test.csv file in read mode (‘r’) as a record, and then we create an object of the csv.reader() method. The next () method, when we call it, automatically discards the first row from the csv reader object and the rest of the data we can use as we need.

Lines 10–11:Now, we are iterating the csv reader object and printing each row. The above output shows that now there is no header row.

Method 2: Using DictReader () method

Now, we are going to see how we can read the csv as a dictionary format. But after reading the csv file as a direct format, we will print only the value, not the key, which will solve our problem of printing all data without the header row. We are using the same test.csv file as we used before. An example of this method is given below:

with open ( «test.csv» , «r» ) as record:
# We are creating an object of the csv reader
csvreader_object = csv . DictReader ( record )
# The line will skip the first row of the csv file (Header row)
# because it works as a dict and we are printing only values not keys
for row in csvreader_object:
print ( row [ «Month» ] , row [ «1958» ] , row [ «1959» ] , row [ «1960» ] )

Line 1: We import the CSV module.

Line 3 -5: We open the test.csv file in read mode (‘r’) as a record, and then we create an object of the csv.DictReader() method.

Lines 8–9: Now, we are iterating the csv DictReader object and printing each row. But this line automatically discards the first row from the csv reader object because DictReader converts each row in a dict (key and value) form. When we print only value, not key, which only shows the data, not the k,v, which was our primary objective.

Method 3: Using Pandas read_csv skiprows attributes

In this method, we are going to use the Pandas read_csv attribute skiprows. In the skiprows, we will mention the header row number, which is obviously 1, so we define the value of the skiprows as 1 as shown in the below program. This way, we can ignore the header row from the csv while reading the data.

import pandas as pd
skipHeaderDf = pd. read_csv ( ‘test.csv’ , skiprows = 1 )

Line 1: We import the Pandas library as a pd.

Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=1, which means skipping the first line while reading the csv file data.

Line 4: Now, we print the final dataframe result shown in the above output without the header row.

Method 4: Using Pandas, remove the header of the csv using index position

In this method, we are going to use the Pandas read_csv attribute skiprows. In the skiprows, we will mention the header index position number, which is obviously 0, so we define the value of the skiprows in square brackets ([ 0 ]) as shown in the below program. This way, we can ignore the header row from the csv while reading the data.

import pandas as pd
skipHeaderDf = pd. read_csv ( ‘test.csv’ , skiprows = [ 0 ] )

Line 1: We import the Pandas library as a pd.

Line 2: We read the csv file using the pandas read_csv module, and in that, we mentioned the skiprows=[0], which means skip the first line while reading the csv file data.

Line 4: Now, we print the final dataframe result shown in the above output without the header row.

Conclusion:

This article has seen four different methods to skip the header row while reading the csv file. All the methods in the above article are perfectly fine and are used by the Python programmer to skip the header of the CSV file while reading the CSV data. The Pandas library method not only allows us to remove the header of the CSV file data but can also be used to remove other rows if we specify their number or index position to the skiprows. So the skiprows will be able to remove all those rows whose numbers will be assigned to them. So the Pandas module to skip header is the best to use, and it is also very convenient for removing the other rows.

The other methods using the DictReader and reader are also available, but these are only for the header rows, so if we want to remove some other rows, we have to write some other code too.

Источник

Skip the header of a file with Python’s CSV reader

Let’s say you have a CSV like this, which you’re trying to parse with Python:

Date,Description,Amount 2015-01-03,Cakes,22.55 2014-12-28,Rent,1000 2014-12-27,Candy Shop,12 . 

You don’t want to parse the first row as data, so you can skip it with next . For example:

with open("mycsv.csv", "r") as csvfile:  csvreader = csv.reader(csvfile)   # This skips the first row of the CSV file.  next(csvreader)   for row in csvreader:  # do stuff with rows. 

The call to next reads the first row and discards it. From there, you’re ready to iterate through the actual data.

You may instead wish to use a DictReader , which parses the first row as field names by default. For example:

with open("mycsv.csv", "r") as csvfile:  csvreader = csv.DictReader(csvfile)  for row in csvreader:  print(row["Date"], row["Description"], row["Amount"]) 

Either way, you’ve now skipped the first row of a CSV file in Python!

  • About me
  • Contact
  • Projects
  • Guides
  • Blog
  • RSS
  • Newsletter
  • Mastodon

Content is licensed under the Creative Commons Attribution-NonCommercial License and code under the Unlicense. The logo was created by Lulu Tang.

Источник

Оцените статью