Python change value in data frame

Содержание

pandas.DataFrame.replace#
How to modify values in a Pandas DataFrame?
Replace specific data in Pandas DataFrames
Creating example data
1. Set cell values in the entire DF using replace()
2. Change value of cell content by index
3. Modify multiple cells in a DataFrame row
4. Update cells based on conditions
5. Set and Replace values for an entire Pandas column / Series.
6. Replace string in Pandas DataFrame column

pandas.DataFrame.replace#

Values of the DataFrame are replaced with other values dynamically.

This differs from updating with .loc or .iloc , which require you to specify a location to update with some value.

Parameters to_replace str, regex, list, dict, Series, int, float, or None

How to find the values that will be replaced.

numeric: numeric values equal to to_replace will be replaced with value
str: string exactly matching to_replace will be replaced with value
regex: regexs matching to_replace will be replaced with value

First, if to_replace and value are both lists, they must be the same length.
Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
str, regex and numeric rules apply as above.

Dicts can be used to specify different replacement values for different existing values. For example, replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way, the optional value parameter should not be given.
For a DataFrame a dict can specify that different values should be replaced in different columns. For example, looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value . The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
For a DataFrame nested dictionaries, e.g., > , are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The optional value parameter should not be specified to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.

This means that the regex argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

value scalar, dict, list, str, regex, default None

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

inplace bool, default False

Whether to modify the DataFrame rather than creating a new one.

limit int, default None

Maximum size gap to forward or backward fill.

regex bool or same types as to_replace , default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None .

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None .

If regex is not a bool and to_replace is not None .

If to_replace is not a scalar, array-like, dict , or None
If to_replace is a dict and value is not a list , dict , ndarray , or Series
If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series.
When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced

If a list or an ndarray is passed to to_replace and value but they are not the same length.

Replace values based on boolean condition.

Simple string replacement.

Regex substitution is performed under the hood with re.sub . The rules for substitution for re.sub are the same.
Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
When dict is used as the to_replace value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.

Scalar `to_replace` and `value`

>>> s = pd.Series([1, 2, 3, 4, 5]) >>> s.replace(1, 5) 0 5 1 2 2 3 3 4 4 5 dtype: int64

>>> df = pd.DataFrame('A': [0, 1, 2, 3, 4], . 'B': [5, 6, 7, 8, 9], . 'C': ['a', 'b', 'c', 'd', 'e']>) >>> df.replace(0, 5) A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e

List-like `to_replace`

>>> df.replace([0, 1, 2, 3], 4) A B C 0 4 5 a 1 4 6 b 2 4 7 c 3 4 8 d 4 4 9 e

>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) A B C 0 4 5 a 1 3 6 b 2 2 7 c 3 1 8 d 4 4 9 e

>>> s.replace([1, 2], method='bfill') 0 3 1 3 2 3 3 4 4 5 dtype: int64

dict-like `to_replace`

>>> df.replace(0: 10, 1: 100>) A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e

>>> df.replace('A': 0, 'B': 5>, 100) A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e

>>> df.replace('A': 0: 100, 4: 400>>) A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e

Regular expression `to_replace`

>>> df = pd.DataFrame('A': ['bat', 'foo', 'bait'], . 'B': ['abc', 'bar', 'xyz']>) >>> df.replace(to_replace=r'^ba.$', value='new', regex=True) A B 0 new abc 1 foo new 2 bait xyz

>>> df.replace('A': r'^ba.$'>, 'A': 'new'>, regex=True) A B 0 new abc 1 foo bar 2 bait xyz

>>> df.replace(regex=r'^ba.$', value='new') A B 0 new abc 1 foo new 2 bait xyz

>>> df.replace(regex=r'^ba.$': 'new', 'foo': 'xyz'>) A B 0 new abc 1 xyz new 2 bait xyz

>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') A B 0 new abc 1 new new 2 bait xyz

Compare the behavior of s.replace() and s.replace(‘a’, None) to understand the peculiarities of the to_replace parameter:

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace() is equivalent to s.replace(to_replace=, value=None, method=None) :

>>> s.replace('a': None>) 0 10 1 None 2 None 3 b 4 None dtype: object

When value is not explicitly passed and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case.

>>> s.replace('a') 0 10 1 10 2 10 3 b 4 b dtype: object

On the other hand, if None is explicitly passed for value , it will be respected:

>>> s.replace('a', None) 0 10 1 None 2 None 3 b 4 None dtype: object

Changed in version 1.4.0: Previously the explicit None was silently ignored.

Источник

How to modify values in a Pandas DataFrame?

As part of our data wrangling process, we are often required to modify data previously acquired from a csv, text, json, API, database or other data source.

In this short tutorial we would like to discuss the basics of replacing/changing/updating manipulation inside Pandas DataFrames.

Replace specific data in Pandas DataFrames

In this tutorial we will look into several cases:

Replacing values in an entire DataFrame
Updating values in specific cells by index
Changing values in an entire DF row
Replace cells content according to condition
Modify values in a Pandas column / series.

Creating example data

Let’s define a simple survey DataFrame:

 # Import DA packages import pandas as pd import numpy as np # Create test Data survey_dict = < 'language': ['Python', 'Java', 'Haskell', 'Go', 'C++'], 'salary': [120,85,95,80,90], 'num_candidates': [18,22,34,10, np.nan] ># Initialize the survey DataFrame survey_df = pd.DataFrame(survey_dict) # Review our DF

1. Set cell values in the entire DF using replace()

We’ll use the DataFrame replace method to modify DF sales according to their value. In the example we’ll replace the empty cell in the last row with the value 17.

survey_df.replace(to_replace= np.nan, value = 17, inplace=True ) survey_df.head()

Note: The replace method is pretty self explanatory, note the usage of the inplace=True to persist the updates in the DataFrame going forward.

Here’s the output we will get:

language	salary	num_candidates
0	Python	120	18.0
1	Java	90	22.0
2	Haskell	95	34.0
3	Go	90	10.0
4	C++	90	17.0

Note: that we could accomplish the same result with the more elegant fillna() method.

survey_df.fillna(value = 17, axis = 1)

2. Change value of cell content by index

To pick a specific row index to be modified, we’ll use the iloc indexer.

survey_df.iloc[0].replace(to_replace=120, value = 130)

Our output will look as following:

language Python salary 130 num_candidates 18.0 Name: 0, dtype: object

Note: We could also use the loc indexer to update one or multiple cells by row/column label. The code below sets the value 130 the first three cells or the salary column.

survey_df.loc[[0,1,2],'salary'] = 130

3. Modify multiple cells in a DataFrame row

Similar to before, but this time we’ll pass a list of values to replace and their respective replacements:

survey_df.loc[0].replace(to_replace=(130,18), value=(120, 20))

4. Update cells based on conditions

In reality, we’ll update our data based on specific conditions. Here’s an example on how to update cells with conditions. Let’s assume that we would like to update the salary figures in our data so that the minimal salary will be $90/hour.

We’ll first slide the DataFrame and find the relevant rows to update:

We’ll then pass the rows and columns labels to be updated into the loc indexer:

survey_df.loc[cond,'salary'] = 90 survey_df

language	salary	num_candidates
0	Python	120	18.0
1	Java	90	22.0
2	Haskell	95	34.0
3	Go	90	10.0
4	C++	90	17.0

Important note: We can obviously write more complex conditions as needed. Below if an example with multiple conditions.

5. Set and Replace values for an entire Pandas column / Series.

Let’s now assume that we would like to modify the num_candidates figure for all observations in our DataFrame. That’s fairly easy to accomplish.

survey_df['num_candidates'] = 25

Let’s now assume that management has decided that all candidates will be offered an 20% raise. We can easily change the salary column using the following Python code:

survey_df['salary'] = survey_df['salary'] * 1.2

6. Replace string in Pandas DataFrame column

We can also replace specific strings in a DataFrame column / series using the syntx below:

survey_df['language'] = survey_df['language'].replace(to_replace = 'Java', value= 'Go')

Источник