Python dropna by column

Содержание

pandas.DataFrame.dropna#
pandas.Series.dropna#
pandas.DataFrame.dropna#
pandas.DataFrame.dropna#
Pandas dropna(): Drop Missing Records and Columns in DataFrames
Understanding the Pandas dropna() Method

pandas.DataFrame.dropna#

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters axis , default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.

Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

how , default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

Require that many non-NA values. Cannot be combined with how.

subset column label or sequence of labels, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace bool, default False

Whether to modify the DataFrame rather than creating a new one.

ignore_index bool, default False

If True , the resulting axis will be labeled 0, 1, …, n — 1.

DataFrame with NA entries dropped from it or None if inplace=True .

Indicate existing (non-missing) values.

>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Drop the rows where at least one element is missing.

>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Источник

pandas.Series.dropna#

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters axis

Unused. Parameter needed for compatibility with DataFrame.

inplace bool, default False

If True, do operation inplace and return None.

how str, optional

Not in use. Kept for compatibility.

ignore_index bool, default False

If True , the resulting axis will be labeled 0, 1, …, n — 1.

Series with NA entries dropped from it or None if inplace=True .

Indicate existing (non-missing) values.

Drop rows or columns which contain NA values.

>>> ser = pd.Series([1., 2., np.nan]) >>> ser 0 1.0 1 2.0 2 NaN dtype: float64

Drop NA values from a Series.

>>> ser.dropna() 0 1.0 1 2.0 dtype: float64

Empty strings are not considered NA values. None is considered an NA value.

>>> ser = pd.Series([np.NaN, 2, pd.NaT, '', None, 'I stay']) >>> ser 0 NaN 1 2 2 NaT 3 4 None 5 I stay dtype: object >>> ser.dropna() 1 2 3 5 I stay dtype: object

Источник

pandas.DataFrame.dropna#

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters : axis , default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.

Only a single axis is allowed.

how , default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

Require that many non-NA values. Cannot be combined with how.

subset column label or sequence of labels, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace bool, default False

Whether to modify the DataFrame rather than creating a new one.

ignore_index bool, default False

If True , the resulting axis will be labeled 0, 1, …, n — 1.

DataFrame with NA entries dropped from it or None if inplace=True .

Indicate existing (non-missing) values.

>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Drop the rows where at least one element is missing.

>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Источник

pandas.DataFrame.dropna#

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters : axis , default 0

Determine if rows or columns which contain missing values are removed.

0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.

Only a single axis is allowed.

how , default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.

Require that many non-NA values. Cannot be combined with how.

subset column label or sequence of labels, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace bool, default False

Whether to modify the DataFrame rather than creating a new one.

ignore_index bool, default False

If True , the resulting axis will be labeled 0, 1, …, n — 1.

DataFrame with NA entries dropped from it or None if inplace=True .

Indicate existing (non-missing) values.

>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Drop the rows where at least one element is missing.

>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT

Источник

Pandas dropna(): Drop Missing Records and Columns in DataFrames

In this tutorial, you’ll learn how to use the Pandas dropna() method to drop missing values in a Pandas DataFrame. Working with missing data is one of the essential skills in cleaning your data before analyzing it. Because data cleaning can take up to 80% of a data analyst’s / data scientist’s time, being able to do this work effectively and efficiently is an important skill.

By the end of this tutorial, you’ll have learned:

How to use the Pandas .dropna() method effectively
How to drop rows missing (NaN) values in Pandas
How to drop columns missing (NaN) values in Pandas
How to use the Pandas .dropna() method only on specific columns
How to set thresholds when dropping missing values in a Pandas DataFrame
How to fix common errors when working with the Pandas .dropna() method

Understanding the Pandas dropna() Method

The Pandas .dropna() method is an essential method for a data analyst or data scientist of any level. Because cleaning data is an essential preprocessing step, knowing how to work with missing data will make you a stronger programmer.

Before diving into how to use the method, let’s take a minute to understand how the Pandas .dropna() method works. We can do this by taking a look at the parameters and default arguments that method provides:

# Understanding the Pandas .dropna() Method import pandas as pd df = pd.DataFrame() df.dropna( axis=0, how='any', thresh=None, subset=None, inplace=False )

We can see that the Pandas .dropna() method offers five different parameters. All of these parameters have default arguments provided. This means that you can simply call the method and it will execute.

However, understanding what the different parameters do will ensure that you get the result you’re hoping for! Let’s break these parameters down a little further:

Источник