Python csv reader skip row

Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python

In this article we will discuss how to skip rows from top , bottom or at specific indicies while reading a csv file and loading contents to a Dataframe.

Python panda’s library provides a function to read a csv file and load data to dataframe directly also skip specified lines from csv file i.e.

pandas.read_csv(filepath_or_buffer, skiprows=N, . )

It can accepts large number of arguments. But here we will discuss few important arguments only i.e.
Arguments:

  • filepath_or_buffer : path of a csv file or it’s object.
  • skiprows : Line numbers to skip while reading csv.
    • If it’s an int then skip that lines from top
    • If it’s a list of int then skip lines at those index positions
    • If it’s a callable function then pass each index to this function to check if line to skipped or not.

    It will read the given csv file by skipping the specified lines and load remaining lines to a dataframe.

    Frequently Asked:

    To use this import pandas module like this,

    Let’s understand by examples,

    Suppose we have a simple CSV file users.csv and it’s contents are,

    >>cat users.txt Name,Age,City jack,34,Sydeny Riti,31,Delhi Aadi,16,New York Suse,32,Lucknow Mark,33,Las vegas Suri,35,Patna

    Let’s load this csv file to a dataframe using read_csv() and skip rows in different ways,

    Skipping N rows from top while reading a csv file to Dataframe

    While calling pandas.read_csv() if we pass skiprows argument with int value, then it will skip those rows from top while reading csv file and initializing a dataframe.
    For example if we want to skip 2 lines from top while reading users.csv file and initializing a dataframe i.e.

    # Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna

    It skipped the top 2 lines from csv and used 3rd line (at index 2) as header row and loaded the remaining rows from csv as data rows in the dataframe.

    Now what if we want to skip some specific rows only while reading csv ?

    Skipping rows at specific index positions while reading a csv file to Dataframe

    While calling pandas.read_csv() if we pass skiprows argument as a list of ints, then it will skip the rows from csv at specified indices in the list. For example if we want to skip lines at index 0, 2 and 5 while reading users.csv file and initializing a dataframe i.e.

    # Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna

    It skipped the lines at index position 0, 2 & 5 from csv and loaded the remaining rows from csv to the dataframe.

    Skipping N rows from top except header while reading a csv file to Dataframe

    As we saw in first example taht while reading users.csv on skipping 3 lines from top will make 3rd line as header row. But that’s not the row that contains column names.
    So, if our csv file has header row and we want to skip first 2 data rows then we need to pass a list to skiprows i.e.

    # Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna

    It will read the csv file to dataframe by skipping 2 lines after the header row in csv file.

    Skip rows from based on condition while reading a csv file to Dataframe

    We can also pass a callable function or lambda function to decide on which rows to skip. On passing callable function as argument in skiprows while calling pandas.read_csv(), it will call the function before reading each row to check if this rows should be skipped or not. It will pass the index postion of each ro in this function.
    Let’s skip rows in csv file whose index position is multiple of 3 i.e. skip every 3rd line while reading csv file and loading dataframe out of it,

    def logic(index): if index % 3 == 0: return True return False # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas

    To skip N numbers of rows from bottom while reading a csv file to a dataframe please pass skipfooter & engine argument in pandas.read_csv() i.e.

    # Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf)
    Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow

    By default read_csv() uses the C engine for parsing but it doesn’t provide the functionality of skipping from bottom. If we want to use this functionality we must pass engine argument along with skipfooter otherwise we will get a warning like this,

    ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.

    Complete example is as follows,

    import pandas as pd def logic(index): if index % 3 == 0: return True return False def main(): print('**** Skip n rows from top while reading csv file to a Dataframe ****') # Skip 2 rows from top in csv and initialize a dataframe usersDf = pd.read_csv('users.csv', skiprows=2) print('Contents of the Dataframe created by skipping top 2 lines from csv file ') print(usersDf) print('**** Skip rows at specific index from top while reading csv file to a Dataframe ****') # Skip rows at specific index usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]) print('Contents of the Dataframe created by skipping specifying lines from csv file ') print(usersDf) print('**** Skip N rows top except header row while reading csv file to a Dataframe ****') # Skip 2 rows from top except header usersDf = pd.read_csv('users.csv', skiprows=[i for i in range(1,3)]) print('Contents of the Dataframe created by skipping 2 rows after header row from csv file ') print(usersDf) print('**** Skip rows based on condition row while reading csv file to a Dataframe ****') # Skip rows from based on condition like skip every 3rd line usersDf = pd.read_csv('users.csv', skiprows= lambda x: logic(x) ) print('Contents of the Dataframe created by skipping every 3rd row from csv file ') print(usersDf) print('**** Skip N rows from bottom while reading csv file to a Dataframe ****') # Skip 2 rows from bottom usersDf = pd.read_csv('users.csv', skipfooter=2, engine='python') print('Contents of the Dataframe created by skipping bottom 2 rows from csv file ') print(usersDf) if __name__ == '__main__': main()
    **** Skip n rows from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping top 2 lines from csv file Riti 31 Delhi 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows at specific index from top while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping specifying lines from csv file jack 34 Sydeny 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Suri 35 Patna **** Skip N rows top except header row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping 2 rows after header row from csv file Name Age City 0 Aadi 16 New York 1 Suse 32 Lucknow 2 Mark 33 Las vegas 3 Suri 35 Patna **** Skip rows based on condition row while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping every 3rd row from csv file jack 34 Sydeny 0 Riti 31 Delhi 1 Suse 32 Lucknow 2 Mark 33 Las vegas **** Skip N rows from bottom while reading csv file to a Dataframe **** Contents of the Dataframe created by skipping bottom 2 rows from csv file Name Age City 0 jack 34 Sydeny 1 Riti 31 Delhi 2 Aadi 16 New York 3 Suse 32 Lucknow

    Источник

    Skip the header of a file with Python’s CSV reader

    Let’s say you have a CSV like this, which you’re trying to parse with Python:

    Date,Description,Amount 2015-01-03,Cakes,22.55 2014-12-28,Rent,1000 2014-12-27,Candy Shop,12 . 

    You don’t want to parse the first row as data, so you can skip it with next . For example:

    with open("mycsv.csv", "r") as csvfile:  csvreader = csv.reader(csvfile)   # This skips the first row of the CSV file.  next(csvreader)   for row in csvreader:  # do stuff with rows. 

    The call to next reads the first row and discards it. From there, you’re ready to iterate through the actual data.

    You may instead wish to use a DictReader , which parses the first row as field names by default. For example:

    with open("mycsv.csv", "r") as csvfile:  csvreader = csv.DictReader(csvfile)  for row in csvreader:  print(row["Date"], row["Description"], row["Amount"]) 

    Either way, you’ve now skipped the first row of a CSV file in Python!

    • About me
    • Contact
    • Projects
    • Guides
    • Blog
    • RSS
    • Newsletter
    • Mastodon

    Content is licensed under the Creative Commons Attribution-NonCommercial License and code under the Unlicense. The logo was created by Lulu Tang.

    Источник

    Читайте также:  Float property in css
Оцените статью