Python read csv skiprows

How to Skip First Rows in Pandas read_csv and skiprows?

Do you need to skip rows while reading CSV file with read_csv in Pandas? If so, this article will show you how to skip first rows of reading file.

Method read_csv has parameter skiprows which can be used as follows:

(1) Skip first rows reading CSV file in Pandas

pd.read_csv(csv_file, skiprows=3, header=None) 

(2) Skip rows by index with read_csv

pd.read_csv(csv_file, skiprows=[0,2]) 

Lets check several practical examples which will cover all aspects of reading CSV file and skipping rows.

To start lets say that we have the next CSV file:

!cat '../data/csv/multine_header.csv' 

CSV file with multiple headers (to learn more about reading a CSV file with multiple headers):

Date,Company A,Company A,Company B,Company B ,Rank,Points,Rank,Points 2021-09-06,1,7.9,2,6 2021-09-07,1,8.5,2,7 2021-09-08,2,8,1,8.1 

Step 1: Skip first N rows while reading CSV file

First example shows how to skip consecutive rows with Pandas read_csv method.

  • skip rows in Pandas without using header
  • skip first N rows and use header for the DataFrame — check Step 2
Читайте также:  Php http base url

In this Step Pandas read_csv method will read data from row 4 (index of this row is 3). The newly created DataFrame will have autogenerated column names:

df = pd.read_csv(csv_file, skiprows=3, header=None) 
0 1 2 3 4
2021-09-07 1 8.5 2 7.0
2021-09-08 2 8.0 1 8.1

Step 2: Skip first N rows and use header

If parameter header of method read_csv is not provided than first row will be used as a header. In combination of parameters header and skiprows — first the rows will be skipped and then first on of the remaining will be used as a header.

In the example below 3 rows from the CSV file will be skipped. The forth one will be used as a header of the new DataFrame.

df = pd.read_csv(csv_file, skiprows=3) 

Step 3: Pandas keep the header and skip first rows

What if you need to keep the header and then the skip N rows? This can be achieved in several different ways.

The most simple one is by builing a list of rows which to be skipped:

rows_to_skip = range(1,3) df = pd.read_csv(csv_file, skiprows=rows_to_skip) 
Date Company A Company A.1 Company B Company B.1
2021-09-07 1 8.5 2 7.0
2021-09-08 2 8.0 1 8.1

As you can see read_csv method keep the header and skip first 2 rows after the header.

Step 4: Skip non consecutive rows with read_csv by index

Parameter skiprows is defined as:

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

So to skip rows 0 and 2 we can pass list of values to skiprows :

df = pd.read_csv(csv_file, skiprows=[0,2]) 

Resources

By using DataScientYst — Data Science Simplified, you agree to our Cookie Policy.

Источник

Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python

In this article we will discuss how to skip rows from top , bottom or at specific indicies while reading a csv file and loading contents to a Dataframe.

Python panda’s library provides a function to read a csv file and load data to dataframe directly also skip specified lines from csv file i.e.

pandas.read_csv(filepath_or_buffer, skiprows=N, . )

It can accepts large number of arguments. But here we will discuss few important arguments only i.e.
Arguments:

  • filepath_or_buffer : path of a csv file or it’s object.
  • skiprows : Line numbers to skip while reading csv.
    • If it’s an int then skip that lines from top
    • If it’s a list of int then skip lines at those index positions
    • If it’s a callable function then pass each index to this function to check if line to skipped or not.

    It will read the given csv file by skipping the specified lines and load remaining lines to a dataframe.

    Frequently Asked:

    To use this import pandas module like this,

    Let’s understand by examples,

    Suppose we have a simple CSV file users.csv and it’s contents are,

    >>cat users.txt Name,Age,City jack,34,Sydeny Riti,31,Delhi Aadi,16,New York Suse,32,Lucknow Mark,33,Las vegas Suri,35,Patna

    Let’s load this csv file to a dataframe using read_csv() and skip rows in different ways,

    Skipping N rows from top while reading a csv file to Dataframe

    While calling pandas.read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe.
    For example if we want to skip 2 lines from top while reading users.csv file and initializing a dataframe i.e.

    # Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna

    It skipped the top 2 lines from csv and used 3rd line (at index 2) as header row and loaded the remaining rows from csv as data rows in the dataframe.

    Now what if we want to skip some specific rows only while reading csv ?

    Skipping rows at specific index positions while reading a csv file to Dataframe

    While calling pandas.read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.csv file and initializing a dataframe i.e.

    # Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna

    It skipped the lines at index position 0, 2 & 5 from csv and loaded the remaining rows from csv to the dataframe.

    Skipping N rows from top except header while reading a csv file to Dataframe

    As we saw in first example taht while reading users.csv on skipping 3 lines from top will make 3rd line as header row. But that’s not the row that contains column names.
    So, if our csv file has header row and we want to skip first 2 data rows then we need to pass a list to skiprows i.e.

    # Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna

    It will read the csv file to dataframe by skipping 2 lines after the header row in csv file.

    Skip rows from based on condition while reading a csv file to Dataframe

    We can also pass a callable function or lambda function to decide on which rows to skip. On passing callable function as argument in skiprows while calling pandas.read_csv(), it will call the function before reading each row to check if this rows should be skipped or not. It will pass the index postion of each ro in this function.
    Let’s skip rows in csv file whose index position is multiple of 3 i.e. skip every 3rd line while reading csv file and loading dataframe out of it,

    def logic(index): if index % 3 == 0: return True return False # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas

    To skip N numbers of rows from bottom while reading a csv file to a dataframe please pass skipfooter & engine argument in pandas.read_csv() i.e.

    # Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow

    By default read_csv() uses the C engine for parsing but it doesn’t provide the functionality of skipping from bottom. If we want to use this functionality we must pass engine argument along with skipfooter otherwise we will get a warning like this,

    ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.

    Complete example is as follows,

    import pandas as pd def logic(index): if index % 3 == 0: return True return False def main(): print('**** Skip n rows from top while reading csv file to a Dataframe ****') # Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf) print('**** Skip rows at specific index from top while reading csv file to a Dataframe ****') # Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf) print('**** Skip N rows top except header row while reading csv file to a Dataframe ****') # Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf) print('**** Skip rows based on condition row while reading csv file to a Dataframe ****') # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf) print('**** Skip N rows from bottom while reading csv file to a Dataframe ****') # Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf) if __name__ == '__main__': main()
    **** Skip n rows from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows at specific index from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna **** Skip N rows top except header row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows based on condition row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas **** Skip N rows from bottom while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow

    Источник

Оцените статью