pandas.DataFrame.dropna#
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters axis , default 0
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
how , default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ‘any’ : If any NA values are present, drop that row or column.
- ‘all’ : If all values are NA, drop that row or column.
Require that many non-NA values. Cannot be combined with how.
subset column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace bool, default False
Whether to modify the DataFrame rather than creating a new one.
ignore_index bool, default False
If True , the resulting axis will be labeled 0, 1, …, n — 1.
DataFrame with NA entries dropped from it or None if inplace=True .
Indicate existing (non-missing) values.
>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
pandas.Series.dropna#
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters axis
Unused. Parameter needed for compatibility with DataFrame.
inplace bool, default False
If True, do operation inplace and return None.
how str, optional
Not in use. Kept for compatibility.
ignore_index bool, default False
If True , the resulting axis will be labeled 0, 1, …, n — 1.
Series with NA entries dropped from it or None if inplace=True .
Indicate existing (non-missing) values.
Drop rows or columns which contain NA values.
>>> ser = pd.Series([1., 2., np.nan]) >>> ser 0 1.0 1 2.0 2 NaN dtype: float64
Drop NA values from a Series.
>>> ser.dropna() 0 1.0 1 2.0 dtype: float64
Empty strings are not considered NA values. None is considered an NA value.
>>> ser = pd.Series([np.NaN, 2, pd.NaT, '', None, 'I stay']) >>> ser 0 NaN 1 2 2 NaT 3 4 None 5 I stay dtype: object >>> ser.dropna() 1 2 3 5 I stay dtype: object
pandas.DataFrame.dropna#
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters : axis , default 0
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Only a single axis is allowed.
how , default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ‘any’ : If any NA values are present, drop that row or column.
- ‘all’ : If all values are NA, drop that row or column.
Require that many non-NA values. Cannot be combined with how.
subset column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace bool, default False
Whether to modify the DataFrame rather than creating a new one.
ignore_index bool, default False
If True , the resulting axis will be labeled 0, 1, …, n — 1.
DataFrame with NA entries dropped from it or None if inplace=True .
Indicate existing (non-missing) values.
>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
pandas.DataFrame.dropna#
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters : axis , default 0
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Only a single axis is allowed.
how , default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ‘any’ : If any NA values are present, drop that row or column.
- ‘all’ : If all values are NA, drop that row or column.
Require that many non-NA values. Cannot be combined with how.
subset column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace bool, default False
Whether to modify the DataFrame rather than creating a new one.
ignore_index bool, default False
If True , the resulting axis will be labeled 0, 1, …, n — 1.
DataFrame with NA entries dropped from it or None if inplace=True .
Indicate existing (non-missing) values.
>>> df = pd.DataFrame("name": ['Alfred', 'Batman', 'Catwoman'], . "toy": [np.nan, 'Batmobile', 'Bullwhip'], . "born": [pd.NaT, pd.Timestamp("1940-04-25"), . pd.NaT]>) >>> df name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna() name toy born 1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns') name 0 Alfred 1 Batman 2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all') name toy born 0 Alfred NaN NaT 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy']) name toy born 1 Batman Batmobile 1940-04-25 2 Catwoman Bullwhip NaT
Pandas dropna(): Drop Missing Records and Columns in DataFrames
In this tutorial, you’ll learn how to use the Pandas dropna() method to drop missing values in a Pandas DataFrame. Working with missing data is one of the essential skills in cleaning your data before analyzing it. Because data cleaning can take up to 80% of a data analyst’s / data scientist’s time, being able to do this work effectively and efficiently is an important skill.
By the end of this tutorial, you’ll have learned:
- How to use the Pandas .dropna() method effectively
- How to drop rows missing (NaN) values in Pandas
- How to drop columns missing (NaN) values in Pandas
- How to use the Pandas .dropna() method only on specific columns
- How to set thresholds when dropping missing values in a Pandas DataFrame
- How to fix common errors when working with the Pandas .dropna() method
Understanding the Pandas dropna() Method
The Pandas .dropna() method is an essential method for a data analyst or data scientist of any level. Because cleaning data is an essential preprocessing step, knowing how to work with missing data will make you a stronger programmer.
Before diving into how to use the method, let’s take a minute to understand how the Pandas .dropna() method works. We can do this by taking a look at the parameters and default arguments that method provides:
# Understanding the Pandas .dropna() Method import pandas as pd df = pd.DataFrame() df.dropna( axis=0, how='any', thresh=None, subset=None, inplace=False )
We can see that the Pandas .dropna() method offers five different parameters. All of these parameters have default arguments provided. This means that you can simply call the method and it will execute.
However, understanding what the different parameters do will ensure that you get the result you’re hoping for! Let’s break these parameters down a little further: