Python pandas select all but rows

pandas: Select rows/columns in DataFrame by indexing «[]»

You can select and get rows, columns, and elements in pandas.DataFrame and pandas.Series by indexing operators (square brackets) [] .

This article describes the following contents.

  • Select columns of pandas.DataFrame
    • [Column name] : Get a single column as Series
    • [List of column names] : Get single or multiple columns as DataFrame
    • [Slice of row name/number] : Get single or multiple rows as DataFrame
    • [Boolean array/Series] : Get True rows as DataFrame
    • [Label/position] : Get the value of a single element
    • [List of labels/positions] : Get single or multiple elements as Series
    • [Slice of label/position] : Get single or multiple elements as Series
    • [Boolean array/Series] : Get True elements as Series

    You can also select columns by slice and rows by its name/number or their list with loc and iloc .

    The following CSV file is used in this sample code.

    import pandas as pd print(pd.__version__) # 1.4.1 df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0) print(df) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Charlie 18 CA 70 # Dave 68 TX 70 # Ellen 24 CA 88 # Frank 30 NY 57 

    Select columns of pandas.DataFrame

    [Column name] : Get a single column as pandas.Series

    You can get the column as pandas.Series by specifying the column name (label) in [] .

    print(df['age']) print(type(df['age'])) # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 # Name: age, dtype: int64 # 

    You may also specify column names as an attribute, like . . Note that if the column name conflicts with existing method names, the method takes precedence.

    print(df.age) print(type(df.age)) # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 # Name: age, dtype: int64 # 

    [List of column names] : Get single or multiple columns as pandas.DataFrame

    You can get multiple columns as pandas.DataFrame by specifying a list of column names in [] . The columns will be in the order of the specified list.

    print(df[['point', 'age']]) print(type(df[['point', 'age']])) # point age # name # Alice 64 24 # Bob 92 42 # Charlie 70 18 # Dave 70 68 # Ellen 88 24 # Frank 57 30 # 

    If you specify a list with one element, a single column pandas.DataFrame is returned, not pandas.Series .

    print(df[['age']]) print(type(df[['age']])) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 # 

    You may also specify a slice of the column name with loc or a column number with iloc . See the following article for details.

    print(df.loc[:, 'age':'state']) print(type(df.loc[:, 'age':'state'])) # age state # name # Alice 24 NY # Bob 42 CA # Charlie 18 CA # Dave 68 TX # Ellen 24 CA # Frank 30 NY # print(df.iloc[:, [2, 0]]) print(type(df.iloc[:, [2, 0]])) # point age # name # Alice 64 24 # Bob 92 42 # Charlie 70 18 # Dave 70 68 # Ellen 88 24 # Frank 57 30 # 

    Select rows of pandas.DataFrame

    [Slice of row name/number] : Get single or multiple rows as pandas.DataFrame

    You can get multiple rows as a pandas.DataFrame by specifying a slice in [] .

    print(df[1:4]) print(type(df[1:4])) # age state point # name # Bob 42 CA 92 # Charlie 18 CA 70 # Dave 68 TX 70 # 

    You may specify a negative value and step ( start:stop:step ) as in a normal slice. For example, you can use slices to extract odd or even rows.

    print(df[:-3]) print(type(df[:-3])) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Charlie 18 CA 70 # print(df[::2]) print(type(df[::2])) # age state point # name # Alice 24 NY 64 # Charlie 18 CA 70 # Ellen 24 CA 88 # print(df[1::2]) print(type(df[1::2])) # age state point # name # Bob 42 CA 92 # Dave 68 TX 70 # Frank 30 NY 57 # 

    An error is raised if a row number is specified alone instead of a slice.

    If only one row is selected, pandas.DataFrame is returned, not pandas.Series .

    print(df[1:2]) print(type(df[1:2])) # age state point # name # Bob 42 CA 92 # 

    You may also specify a slice of row name (label) instead of row number (position). In the case of a slice with row name, the stop row is included.

    print(df['Bob':'Ellen']) print(type(df['Bob':'Ellen'])) # age state point # name # Bob 42 CA 92 # Charlie 18 CA 70 # Dave 68 TX 70 # Ellen 24 CA 88 # 

    You can specify the row name/number alone or its list with loc or iloc . See the following article for details.

    print(df.loc['Bob']) print(type(df.loc['Bob'])) # age 42 # state CA # point 92 # Name: Bob, dtype: object # print(df.loc[['Bob', 'Ellen']]) print(type(df.loc[['Bob', 'Ellen']])) # age state point # name # Bob 42 CA 92 # Ellen 24 CA 88 # print(df.iloc[[1, 4]]) print(type(df.iloc[[1, 4]])) # age state point # name # Bob 42 CA 92 # Ellen 24 CA 88 # 

    [Boolean array/Series] : Get True rows as pandas.DataFrame

    By specifying a boolean array ( list or numpy.ndarray ) in [] , you can extract the True rows as pandas.DataFrame .

    l_bool = [True, False, False, True, True, False] print(df[l_bool]) # age state point # name # Alice 24 NY 64 # Dave 68 TX 70 # Ellen 24 CA 88 

    An error is raised if the number of elements does not match.

    # print(df[[True, False, False]]) # ValueError: Item wrong length 3 instead of 6. 

    You can also specify the boolean pandas.Series . Rows are extracted based on labels, not order.

    s_bool = pd.Series(l_bool, index=reversed(df.index)) print(s_bool) # Frank True # Ellen False # Dave False # Charlie True # Bob True # Alice False # dtype: bool print(df[s_bool]) # age state point # name # Bob 42 CA 92 # Charlie 18 CA 70 # Frank 30 NY 57 

    An error is raised if the number of elements or labels does not match.

    s_bool_wrong = pd.Series(l_bool, index=['A', 'B', 'C', 'D', 'E', 'F']) # print(df[s_bool_wrong]) # IndexingError: Unalignable boolean Series provided as indexer # (index of the boolean Series and of the indexed object do not match). 

    Select elements of pandas.Series

    Use the following pandas.Series as an example.

    s = df['age'] print(s) # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 # Name: age, dtype: int64 

    [Label/position] : Get the value of a single element

    You can get the value of the element by specifying the label/position (index) alone. When specifying by position (index), a negative value can be used to specify the position from the end. -1 is the tail.

    You may also specify the label name as an attribute, like . . Note that if the label name conflicts with existing method names, the method takes precedence.

    print(s[3]) print(type(s[3])) # 68 # print(s['Dave']) print(type(s['Dave'])) # 68 # print(s[-1]) print(type(s[-1])) # 30 # print(s.Dave) print(type(s.Dave)) # 68 # 

    [List of labels/positions] : Get single or multiple elements as pandas.Series

    You can select multiple values as pandas.Series by specifying a list of labels/positions. The elements will be in the order of the specified list.

    print(s[[1, 3]]) print(type(s[[1, 3]])) # name # Bob 42 # Dave 68 # Name: age, dtype: int64 # print(s[['Bob', 'Dave']]) print(type(s[['Bob', 'Dave']])) # name # Bob 42 # Dave 68 # Name: age, dtype: int64 # 

    If a list with one element is specified, pandas.Series is returned.

    print(s[[1]]) print(type(s[[1]])) # name # Bob 42 # Name: age, dtype: int64 # print(s[['Bob']]) print(type(s[['Bob']])) # name # Bob 42 # Name: age, dtype: int64 # 

    [Slice of label/position] : Get single or multiple elements as pandas.Series

    You can also select multiple values as pandas.Series by specifying a slice of label/position. In the case of a label name, the stop element is included.

    print(s[1:3]) print(type(s[1:3])) # name # Bob 42 # Charlie 18 # Name: age, dtype: int64 # print(s['Bob':'Dave']) print(type(s['Bob':'Dave'])) # name # Bob 42 # Charlie 18 # Dave 68 # Name: age, dtype: int64 # 

    If one element is selected, pandas.Series is returned.

    print(s[1:2]) print(type(s[1:2])) # name # Bob 42 # Name: age, dtype: int64 # print(s['Bob':'Bob']) print(type(s['Bob':'Bob'])) # name # Bob 42 # Name: age, dtype: int64 # 

    [Boolean array/Series] : Get True elements as pandas.Series

    By specifying a boolean array ( list or numpy.ndarray ) in [] , you can extract the True elements as pandas.Series .

    l_bool = [True, False, False, True, True, False] print(s[l_bool]) # name # Alice 24 # Dave 68 # Ellen 24 # Name: age, dtype: int64 

    An error is raised If the number of elements does not match.

    # print(s[[True, False, False]]) # IndexError: Boolean index has wrong length: 3 instead of 6 

    You can also specify the boolean pandas.Series . Elements are extracted based on labels, not order.

    s_bool = pd.Series(l_bool, index=reversed(df.index)) print(s_bool) # Frank True # Ellen False # Dave False # Charlie True # Bob True # Alice False # dtype: bool print(s[s_bool]) # name # Bob 42 # Charlie 18 # Frank 30 # Name: age, dtype: int64 

    An error is raised if the number of elements or labels does not match.

    s_bool_wrong = pd.Series(l_bool, index=['A', 'B', 'C', 'D', 'E', 'F']) # print(s[s_bool_wrong]) # IndexingError: Unalignable boolean Series provided as indexer # (index of the boolean Series and of the indexed object do not match). 

    Select elements of pandas.DataFrame

    You can get the value of an element from pandas.DataFrame by extracting pandas.Series from pandas.DataFrame and then getting the value from that pandas.Series .

    You may also extract any group by slices or lists.

    print(df['Bob':'Dave'][['age', 'point']]) # age point # name # Bob 42 92 # Charlie 18 70 # Dave 68 70 

    However, this way ( [. ]] [. ] ) is called chained indexing and may result in a SettingWithCopyWarning when assigning values.

    You can select rows or columns at once with at , iat , loc , or iloc .

    print(df.at['Alice', 'age']) # 24 print(df.loc['Bob':'Dave', ['age', 'point']]) # age point # name # Bob 42 92 # Charlie 18 70 # Dave 68 70 

    Note that row and column names are integer

    Be careful when row and column names are integers.

    Use the following pandas.DataFrame as an example.

    df = pd.DataFrame([[0, 10, 20], [30, 40, 50], [60, 70, 80]], index=[2, 0, 1], columns=[1, 2, 0]) print(df) # 1 2 0 # 2 0 10 20 # 0 30 40 50 # 1 60 70 80 

    If [scalar value] or [list] , the specified value is considered a column name.

    print(df[0]) # 2 20 # 0 50 # 1 80 # Name: 0, dtype: int64 print(df[[0, 2]]) # 0 2 # 2 20 10 # 0 50 40 # 1 80 70 

    If [slice] , the specified value is considered a row number, not a row name. Negative values are also allowed.

    print(df[:2]) # 1 2 0 # 2 0 10 20 # 0 30 40 50 print(df[-2:]) # 1 2 0 # 0 30 40 50 # 1 60 70 80 

    Use loc or iloc to clearly specify whether it is a name (label) or a number (position).

    print(df.loc[:2]) # 1 2 0 # 2 0 10 20 print(df.iloc[:2]) # 1 2 0 # 2 0 10 20 # 0 30 40 50 
    s = df[2] print(s) # 2 10 # 0 40 # 1 70 # Name: 2, dtype: int64 

    In pandas.Series , the specified value is considered a label, not an index.

    Use at or iat to clearly specify whether it is a label or an index.

    print(s.at[0]) # 40 print(s.iat[0]) # 10 

    Note that if you specify [-1] , it is considered a label named -1 , not the tail. You can use iat .

    # print(s[-1]) # KeyError: -1 print(s.iat[-1]) # 70 

    Thus, it is better to use at , iat , loc , or iloc when the row or column name is an integer.

    • Check pandas version: pd.show_versions
    • pandas: Remove missing values (NaN) with dropna()
    • How to fix «ValueError: The truth value . is ambiguous» in NumPy, pandas
    • pandas: Rename column/index names (labels) of DataFrame
    • pandas: Shuffle rows/elements of DataFrame/Series
    • pandas: How to fix SettingWithCopyWarning: A value is trying to be set on .
    • pandas: Assign existing column to the DataFrame index with set_index()
    • pandas: Copy DataFrame to the clipboard with to_clipboard()
    • pandas: Get first/last n rows of DataFrame with head(), tail(), slice
    • Missing values in pandas (nan, None, pd.NA)
    • pandas: Get and set options for display, data behavior, etc.
    • Difference between lists, arrays and numpy.ndarray in Python
    • pandas: Split string columns by delimiters or regular expressions
    • pandas: Get clipboard contents as DataFrame with read_clipboard()
    • pandas: Slice substrings from each element in columns

    Источник

    Читайте также:  Vertical Line in html
Оцените статью