Python pandas замена значений по условию

pandas.Series.replace#

Values of the Series are replaced with other values dynamically.

This differs from updating with .loc or .iloc , which require you to specify a location to update with some value.

Parameters to_replace str, regex, list, dict, Series, int, float, or None

How to find the values that will be replaced.

  • numeric: numeric values equal to to_replace will be replaced with value
  • str: string exactly matching to_replace will be replaced with value
  • regex: regexs matching to_replace will be replaced with value
  • First, if to_replace and value are both lists, they must be the same length.
  • Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
  • str, regex and numeric rules apply as above.
  • Dicts can be used to specify different replacement values for different existing values. For example, replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way, the optional value parameter should not be given.
  • For a DataFrame a dict can specify that different values should be replaced in different columns. For example, looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value . The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
  • For a DataFrame nested dictionaries, e.g., > , are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
  • This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.
Читайте также:  Размер текста

See the examples section for examples of each of these.

value scalar, dict, list, str, regex, default None

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

inplace bool, default False

If True, performs operation inplace and returns None.

limit int, default None

Maximum size gap to forward or backward fill.

regex bool or same types as to_replace , default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None .

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None .

  • If regex is not a bool and to_replace is not None .
  • If to_replace is not a scalar, array-like, dict , or None
  • If to_replace is a dict and value is not a list , dict , ndarray , or Series
  • If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
  • When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced
  • If a list or an ndarray is passed to to_replace and value but they are not the same length.

Replace values based on boolean condition.

Simple string replacement.

  • Regex substitution is performed under the hood with re.sub . The rules for substitution for re.sub are the same.
  • Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
  • This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
  • When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.

Scalar `to_replace` and `value`

>>> s = pd.Series([1, 2, 3, 4, 5]) >>> s.replace(1, 5) 0 5 1 2 2 3 3 4 4 5 dtype: int64 
>>> df = pd.DataFrame('A': [0, 1, 2, 3, 4], . 'B': [5, 6, 7, 8, 9], . 'C': ['a', 'b', 'c', 'd', 'e']>) >>> df.replace(0, 5) A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e 

List-like `to_replace`

>>> df.replace([0, 1, 2, 3], 4) A B C 0 4 5 a 1 4 6 b 2 4 7 c 3 4 8 d 4 4 9 e 
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) A B C 0 4 5 a 1 3 6 b 2 2 7 c 3 1 8 d 4 4 9 e 
>>> s.replace([1, 2], method='bfill') 0 3 1 3 2 3 3 4 4 5 dtype: int64 

dict-like `to_replace`

>>> df.replace(0: 10, 1: 100>) A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e 
>>> df.replace('A': 0, 'B': 5>, 100) A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e 
>>> df.replace('A': 0: 100, 4: 400>>) A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e 

Regular expression `to_replace`

>>> df = pd.DataFrame('A': ['bat', 'foo', 'bait'], . 'B': ['abc', 'bar', 'xyz']>) >>> df.replace(to_replace=r'^ba.$', value='new', regex=True) A B 0 new abc 1 foo new 2 bait xyz 
>>> df.replace('A': r'^ba.$'>, 'A': 'new'>, regex=True) A B 0 new abc 1 foo bar 2 bait xyz 
>>> df.replace(regex=r'^ba.$', value='new') A B 0 new abc 1 foo new 2 bait xyz 
>>> df.replace(regex=r'^ba.$': 'new', 'foo': 'xyz'>) A B 0 new abc 1 xyz new 2 bait xyz 
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') A B 0 new abc 1 new new 2 bait xyz 

Compare the behavior of s.replace() and s.replace(‘a’, None) to understand the peculiarities of the to_replace parameter:

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace() is equivalent to s.replace(to_replace=, value=None, method=None) :

>>> s.replace('a': None>) 0 10 1 None 2 None 3 b 4 None dtype: object 

When value is not explicitly passed and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case.

>>> s.replace('a') 0 10 1 10 2 10 3 b 4 b dtype: object 

On the other hand, if None is explicitly passed for value , it will be respected:

>>> s.replace('a', None) 0 10 1 None 2 None 3 b 4 None dtype: object 

Changed in version 1.4.0: Previously the explicit None was silently ignored.

Источник

Замена одного или нескольких значений в столбце в DataFrame Pandas

Чтобы заменить значения в столбце на основе условия в Pandas DataFrame, вы можете использовать свойство DataFrame.loc, numpy.where() или DataFrame.where().

В этом руководстве мы рассмотрим все эти процессы на примерах программ.

Метод 1: в зависимости от условия

Чтобы заменить значения в столбце на основе условия с помощью DataFrame.loc, используйте следующий синтаксис.

DataFrame.loc[condition, column_name] = new_value

В следующей программе мы заменим те значения в столбце «a», которые удовлетворяют условию, что значение меньше нуля.

import pandas as pd df = pd.DataFrame([ [-10, -9, 8], [6, 2, -4], [-8, 5, 1]], columns=['a', 'b', 'c']) df.loc[(df.a < 0), 'a'] = 0 print(df)
a b c 0 0 -9 8 1 6 2 -4 2 0 5 1

Вы также можете заменить значения в нескольких значениях на основе одного условия. Передайте столбцы как кортеж в loc.

DataFrame.loc[condition, (column_1, column_2)] = new_value

В следующей программе мы заменим те значения в столбцах «a» и «b», которые удовлетворяют условию, что значение меньше нуля.

import pandas as pd df = pd.DataFrame([ [-10, -9, 8], [6, 2, -4], [-8, 5, 1]], columns=['a', 'b', 'c']) df.loc[(df.a < 0), ('a', 'b')] = 0 print(df)
a b c 0 0 0 8 1 6 2 -4 2 0 0 1

Метод 2: с помощью where

Чтобы заменить значения в столбце на основе условия с помощью numpy.where, используйте следующий синтаксис.

DataFrame['column_name'] = numpy.where(condition, new_value, DataFrame.column_name)

В следующей программе мы воспользуемся методом numpy.where() и заменим те значения в столбце «a», которые удовлетворяют условию, что значение меньше нуля.

import pandas as pd import numpy as np df = pd.DataFrame([ [-10, -9, 8], [6, 2, -4], [-8, 5, 1]], columns=['a', 'b', 'c']) df['a'] = np.where((df.a < 0), 0, df.a) print(df)
a b c 0 0 -9 8 1 6 2 -4 2 0 5 1

Метод 3

Чтобы заменить значения в столбце на основе условия с помощью numpy.where, используйте следующий синтаксис.

DataFrame['column_name'].where(~(condition), other=new_value, inplace=True)
  • column_name – это столбец, в котором необходимо заменить значения.
  • condition – это логическое выражение, которое применяется для каждого значения в столбце.
  • new_value заменяет (поскольку inplace = True) существующее значение в указанном столбце на основе условия.

В следующей программе мы будем использовать метод DataFrame.where() и заменим те значения в столбце «a», которые удовлетворяют условию, что значение меньше нуля.

import pandas as pd df = pd.DataFrame([ [-10, -9, 8], [6, 2, -4], [-8, 5, 1]], columns=['a', 'b', 'c']) df['a'].where(~(df.a < 0), other=0, inplace=True) print(df)
a b c 0 0 -9 8 1 6 2 -4 2 0 5 1

В этом руководстве на примерах Python мы узнали, как заменить значения столбца в DataFrame новым значением в зависимости от условия.

Как заменить несколько значений?

Чтобы заменить несколько значений в DataFrame, вы можете использовать метод DataFrame.replace() со словарем различных замен, переданных в качестве аргумента.

Пример 1

Синтаксис для замены нескольких значений в столбце DataFrame:

В следующем примере мы будем использовать метод replace() для замены 1 на 11 и 2 на 22 в столбце a.

import pandas as pd df = pd.DataFrame([ [4, -9, 8], [1, 2, -4], [2, 2, -8], [0, 7, -4], [2, 5, 1]], columns=['a', 'b', 'c']) df = df.replace(>) print(df)
a b c 0 4 -9 8 1 11 2 -4 2 22 2 -8 3 0 7 -4 4 22 5 1

Пример 2

Синтаксис для замены нескольких значений в нескольких столбцах DataFrame:

DataFrame.replace(, 'column_name_2' : < old_value_1 : new_value_1, old_value_2 : new_value_2>>)

В следующем примере мы воспользуемся методом replace() для замены 1 на 11 и 2 на 22 в столбце a; 5 с 55 и 2 с 22 в столбце b.

import pandas as pd df = pd.DataFrame([ [4, -9, 8], [1, 2, -4], [2, 2, -8], [0, 7, -4], [2, 5, 1]], columns=['a', 'b', 'c']) df = df.replace(, 'b':>) print(df)
a b c 0 4 -9 8 1 11 22 -4 2 22 22 -8 3 0 7 -4 4 22 55 1

В этом руководстве на примерах Python мы узнали, как заменить несколько значений в Pandas DataFrame в одном или нескольких столбцах.

Источник

Pandas How to replace values based on Conditions

Using these methods either you can replace a single cell or all the values of a row and column in a dataframe based on conditions .

import pandas as pd import numpy as np df = pd.DataFrame() df 

Using loc for Replace

Replace all the Dance in Column Event with Hip-Hop

df.loc[(df.Event == 'Dance'),'Event']='Hip-Hop' df 

Using numpy where

Replace all Paintings in Column Event with Art

df['Event'] = np.where((df.Event == 'Painting'),'Art',df.Event) df 

Using Mask for Replace

Replace all the Hip-Hop in Column Event with Jazz

df['Event'].mask(df['Event'] == 'Hip-Hop', 'Jazz', inplace=True) 

Using df where

Replace all Art in Column Event with Theater

m = df.Event == 'Art' df.where(~m,other='Theater') 

Create a new Dataframe

df = pd.DataFrame([[1.4, 8], [1.2, 5], [0.3, 10]], index=['China', 'India', 'USA'], columns=['Population(B)', 'Economy']) 

Set value for an entire row

Set value for an entire column

Set value for rows matching condition

Updated: July 17, 2019

Share on

You may also enjoy

pandas count duplicate rows

DataFrames are a powerful tool for working with data in Python, and Pandas provides a number of ways to count duplicate rows in a DataFrame. In this article.

Pandas value error while merging two dataframes with different data types

If you’re encountering a “value error” while merging Pandas data frames, this article has got you covered. Learn how to troubleshoot and solve common issues .

How to get True Positive, False Positive, True Negative and False Negative from confusion matrix in scikit learn

In machine learning, we often use classification models to predict the class labels of a set of samples. The predicted labels may or may not match the true .

Pandas how to use list of values to select rows from a dataframe

In this post we will see how to use a list of values to select rows from a pandas dataframe We will follow these steps to select rows based on list of value.

Источник

Оцените статью