Python find nan in dataframe

How to Check If Any Value is NaN in a Pandas DataFrame

The official documentation for pandas defines what most developers would know as null values as missing or missing data in pandas. Within pandas, a missing value is denoted by NaN .

In most cases, the terms missing and null are interchangeable, but to abide by the standards of pandas, we’ll continue using missing throughout this tutorial.

Evaluating for Missing Data

At the base level, pandas offers two functions to test for missing data, isnull() and notnull() . As you may suspect, these are simple functions that return a boolean value indicating whether the passed in argument value is in fact missing data.

In addition to the above functions, pandas also provides two methods to check for missing data on Series and DataFrame objects. These methods evaluate each object in the Series or DataFrame and provide a boolean value indicating if the data is missing or not.

For example, let’s create a simple Series in pandas:

import pandas as pd import numpy as np s = pd.Series([2,3,np.nan,7,"The Hobbit"]) 

Now evaluating the Series s , the output shows each value as expected, including index 2 which we explicitly set as missing .

In [2]: s Out[2]: 0 2 1 3 2 NaN 3 7 4 The Hobbit dtype: object 

To test the isnull() method on this series, we can use s.isnull() and view the output:

In [3]: s.isnull() Out[3]: 0 False 1 False 2 True 3 False 4 False dtype: bool 

As expected, the only value evaluated as missing is index 2 .

Читайте также:  Python strings must be encoded before hashing

Determine if ANY Value in a Series is Missing

While the isnull() method is useful, sometimes we may wish to evaluate whether any value is missing in a Series.

There are a few possibilities involving chaining multiple methods together.

The fastest method is performed by chaining .values.any() :

In [4]: s.isnull().values.any() Out[4]: True 

In some cases, you may wish to determine how many missing values exist in the collection, in which case you can use .sum() chained on:

Count Missing Values in DataFrame

While the chain of .isnull().values.any() will work for a DataFrame object to indicate if any value is missing , in some cases it may be useful to also count the number of missing values across the entire DataFrame. Since DataFrames are inherently multidimensional, we must invoke two methods of summation.

For example, first we need to create a simple DataFrame with a few missing values:

In [6]: df = pd.DataFrame(np.random.randn(5,5)) df[df > 0.9] = pd.np.nan 

Now if we chain a .sum() method on, instead of getting the total sum of missing values, we’re given a list of all the summations of each column :

In [7]: df.isnull().sum() Out[7]: 0 3 1 0 2 1 3 1 4 0 dtype: int64 

We can see in this example, our first column contains three missing values, along with one each in column 2 and 3 as well.

In order to get the total summation of all missing values in the DataFrame, we chain two .sum() methods together:

In [8]: df.isnull().sum().sum() Out[8]: 5 

Источник

pandas.DataFrame.isna#

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN , gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings » or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True ).

Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

Omit axes labels with missing values.

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN], . born=[pd.NaT, pd.Timestamp('1939-05-27'), . pd.Timestamp('1940-04-25')], . name=['Alfred', 'Batman', ''], . toy=[None, 'Batmobile', 'Joker'])) >>> df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker 
>>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False 

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.NaN]) >>> ser 0 5.0 1 6.0 2 NaN dtype: float64 
>>> ser.isna() 0 False 1 False 2 True dtype: bool 

Источник

pandas.isnull#

This function takes a scalar or array-like object and indicates whether values are missing ( NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

Parameters obj scalar or array-like

Object to check for null or missing values.

Returns bool or array-like of bool

For scalar input, returns a scalar boolean. For array input, returns an array of boolean indicating whether each corresponding element is missing.

Boolean inverse of pandas.isna.

Detect missing values in a Series.

Detect missing values in a DataFrame.

Detect missing values in an Index.

Scalar arguments (including strings) result in a scalar boolean.

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]]) >>> array array([[ 1., nan, 3.], [ 4., 5., nan]]) >>> pd.isna(array) array([[False, True, False], [False, False, True]]) 

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, . "2017-07-08"]) >>> index DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'], dtype='datetime64[ns]', freq=None) >>> pd.isna(index) array([False, False, True, False]) 

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([['ant', 'bee', 'cat'], ['dog', None, 'fly']]) >>> df 0 1 2 0 ant bee cat 1 dog None fly >>> pd.isna(df) 0 1 2 0 False False False 1 False True False 
>>> pd.isna(df[1]) 0 False 1 True Name: 1, dtype: bool 

Источник

Check for NaN in Pandas DataFrame (examples included)

Data to Fish

Here are 4 ways to check for NaN in Pandas DataFrame:

(1) Check for NaN under a single DataFrame column:

df['your column name'].isnull().values.any()

(2) Count the NaN under a single DataFrame column:

df['your column name'].isnull().sum()

(3) Check for NaN under an entire DataFrame:

(4) Count the NaN under an entire DataFrame:

Examples of checking for NaN in Pandas DataFrame

(1) Check for NaN under a single DataFrame column

In the following example, we’ll create a DataFrame with a set of numbers and 3 NaN values:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) print (df)

You’ll now see the DataFrame with the 3 NaN values:

 set_of_numbers 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 NaN 6 6.0 7 7.0 8 NaN 9 8.0 10 9.0 11 10.0 12 NaN 

You can then use the following template in order to check for NaN under a single DataFrame column:

df['your column name'].isnull().values.any()

For our example, the DataFrame column is ‘set_of_numbers.’

And so, the code to check whether a NaN value exists under the ‘set_of_numbers’ column is as follows:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) check_for_nan = df['set_of_numbers'].isnull().values.any() print (check_for_nan)

Run the code, and you’ll get ‘True’ which confirms the existence of NaN values under the DataFrame column:

And if you want to get the actual breakdown of the instances where NaN values exist, then you may remove .values.any() from the code. So the complete syntax to get the breakdown would look as follows:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) check_for_nan = df['set_of_numbers'].isnull() print (check_for_nan)

You’ll now see the 3 instances of the NaN values:

0 False 1 False 2 False 3 False 4 False 5 True 6 False 7 False 8 True 9 False 10 False 11 False 12 True 

Here is another approach where you can get all the instances where a NaN value exists:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes' df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No' print (df)

You’ll now see a new column (called ‘value_is_NaN’), which indicates all the instances where a NaN value exists:

 set_of_numbers value_is_NaN 0 1.0 No 1 2.0 No 2 3.0 No 3 4.0 No 4 5.0 No 5 NaN Yes 6 6.0 No 7 7.0 No 8 NaN Yes 9 8.0 No 10 9.0 No 11 10.0 No 12 NaN Yes 

(2) Count the NaN under a single DataFrame column

You can apply this syntax in order to count the NaN values under a single DataFrame column:

df['your column name'].isnull().sum()

Here is the syntax for our example:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) count_nan = df['set_of_numbers'].isnull().sum() print ('Count of NaN: ' + str(count_nan))

You’ll then get the count of 3 NaN values:

And here is another approach to get the count:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) df.loc[df['set_of_numbers'].isnull(),'value_is_NaN'] = 'Yes' df.loc[df['set_of_numbers'].notnull(), 'value_is_NaN'] = 'No' count_nan = df.loc[df['value_is_NaN']=='Yes'].count() print (count_nan)

As before, you’ll get the count of 3 instances of NaN values:

(3) Check for NaN under an entire DataFrame

Now let’s add a second column into the original DataFrame. This column would include another set of numbers with NaN values:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) print (df)

Run the code, and you’ll get 8 instances of NaN values across the entire DataFrame:

 first_set_of_numbers second_set_of_numbers 0 1.0 11.0 1 2.0 12.0 2 3.0 NaN 3 4.0 13.0 4 5.0 14.0 5 NaN NaN 6 6.0 15.0 7 7.0 16.0 8 NaN NaN 9 8.0 NaN 10 9.0 17.0 11 10.0 NaN 12 NaN 19.0 

You can then apply this syntax in order to verify the existence of NaN values under the entire DataFrame:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) check_nan_in_df = df.isnull().values.any() print (check_nan_in_df)

Once you run the code, you’ll get ‘True’ which confirms the existence of NaN values in the DataFrame:

You can get a further breakdown by removing .values.any() from the code:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) check_nan_in_df = df.isnull() print (check_nan_in_df)

Here is the result of the breakdown:

 first_set_of_numbers second_set_of_numbers 0 False False 1 False False 2 False True 3 False False 4 False False 5 True True 6 False False 7 False False 8 True True 9 False True 10 False False 11 False True 12 True False 

(4) Count the NaN under an entire DataFrame

You may now use this template to count the NaN values under the entire DataFrame:

Here is the code for our example:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) count_nan_in_df = df.isnull().sum().sum() print ('Count of NaN: ' + str(count_nan_in_df))

You’ll then get the total count of 8:

And if you want to get the count of NaN by column, then you may use the following code:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) count_nan_in_df = df.isnull().sum() print (count_nan_in_df)
first_set_of_numbers 3 second_set_of_numbers 5 

You just saw how to check for NaN in Pandas DataFrame. Alternatively you may:

Источник

Оцените статью