Python apply to columns

Содержание

Pandas Dataframes: Apply Examples
Apply example
Apply example, custom function
Take multiple columns as parameters
Apply function to row
Apply function to column
Return multiple columns
Apply function in parallel
Vectorization and Performance
map vs apply
Pandas apply() Function to Single & Multiple Column(s)
1. Quick Examples of pandas Apply Function to a Column
2. pandas.DataFrame.apply() Function Syntax
3. Pandas Apply Function to Single Column
4. Pandas Apply Function to All Columns
5. Pandas Apply Function to Multiple List of Columns
6. Apply Lambda Function to Each Column
7. Apply Lambda Function to Single Column
8. Using pandas.DataFrame.transform() to Apply Function Column
9. Using pandas.DataFrame.map() to Single Column
10. DataFrame.assign() to Apply Lambda Function
11. Using Numpy function on single Column
12. Using NumPy.square() Method
13. Multiple columns Using NumPy.square() and Lambda Function
Conclusion
Related Articles
References
You may also like reading:
Pandas apply function to column
How do I apply function to column in pandas?
Using dataframe.apply() function
Using lambda function along with the apply() function
Using dataframe.transform() function

Pandas Dataframes: Apply Examples

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Pandas version 1.0+ used.

All code available online on this jupyter notebook

Apply example

To apply a function to a dataframe column, do df[‘my_col’].apply(function) , where the function takes one element and return another value.

import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >)[['name', 'age']] # each element of the age column is a string # so you can call .upper() on it df['name_uppercase'] = df['name'].apply(lambda element: element.upper())

BEFORE: Original dataframe

AFTER: Created new column using Series.apply()

Apply example, custom function

To apply a custom function to a column, you just need to define a function that takes one element and returns a new value:

import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >) # function that takes one value, returns one value def first_letter(input_str): return input_str[:1] # pass just the function name to apply df['first_letter'] = df['name'].apply(first_letter)

Source dataframe

Crated a new column passing a
custom function to apply

Take multiple columns as parameters

Double square brackets return another dataframe instead of a series

To apply a single function using multiple columns, select columns using double square brackets ( [[]] ) and use axis=1 :

import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >) # define a function that takes two values, returns 1 value def concatenate(value_1, value_2): return str(value_1)+ "--" + str(value_2) # note the use of DOUBLE SQUARE BRACKETS! df['concatenated'] = df[['name','age']].apply( lambda row: concatenate(row['name'], row['age']) , axis=1)

Original dataframe

Created a new column by applying a function that
takes two columns and concatenates them as strings

Apply function to row

To apply a dunction to a full row instead of a column, use axis=1 and call apply on the dataframe itself:

Example: Sum all values in each row:

import pandas as pd df = pd.DataFrame( 'value1': [1,2,3,4,5], 'value2': [5,4,3,2,1], 'value3': [10,20,30,40,50], 'value4': [99,99,99,99,np.nan], >) def sum_all(row): return np.sum(row) # note that apply was called on the dataframe itself, not on columns df['sum_all'] = df.apply(lambda row: sum_all(row) , axis=1)

Source dataframe where each row contains
observations for one sample

Generated a new column by summing all
values in the row, with numpy.sum

Apply function to column

Return multiple columns

To apply a function to a column and return multiple values so that you can create multiple columns, return a pd.Series with the values instead:

Example: produce two values from a function and assign to two columns

import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david','edward'], 'age': [25,26,27,22,np.nan], >) def times_two_times_three(value): value_times_2 = value*2 value_times_3 = value*3 return pd.Series([value_times_2,value_times_3]) # note that apply was called on age column df[['times_2','times_3']]= df['age'].apply(times_two_times_three)

Source dataframe

Modified dataframe with two new columns,
both returned by apply

Apply function in parallel

If you have costly operations you need to perform on a dataframe, (e.g. text preprocessing), you can split the operation into multiple cores to decrease the running time:

import multiprocessing import numpy as np import pandas as pd # how many cores do you have? NUM_CORES=8 # replace load_large_dataframe() with your dataframe df = load_large_dataframe() # split the dataframe into chunks, depending on hoe many cores you have df_chunks = np.array_split(df ,NUM_CORES) # this is a function that takes one dataframe chunk and returns # the processed chunk (for example, adding processed columns) def process_df(input_df): # copy the dataframe to prevent mutation in place output_df = input_df.copy() # apply a function to every row *in this chunk* output_df['new_column'] = output_df.apply(some_function, axis=1) return output_df with multiprocessing.Pool(NUM_CORES) as pool: # process each chunk in a separate core and merge the results full_output_df = pd.concat(pool.map(process_df, df_chunks), ignore_index=True)

Vectorization and Performance

map vs apply

map()	apply()
Series function	Series function and Dataframe function
Returns new Series	Returns new dataframe, possibly with a single column
Can only be applied to a single column (one element at a time)	Can be applied to multiple columns at the same time
Operates on array elements, one at a time	Operates on whole columns or rows
Very slow, no better than a Python for loop	Much faster when you can use numpy vectorized functions

Источник

Pandas apply() Function to Single & Multiple Column(s)

Using pandas.DataFrame.apply() method you can execute a function to a single column, all and list of multiple columns (two or more). In this article, I will cover how to apply() a function on values of a selected single, multiple, all columns. For example, let’s say we have three columns and would like to apply a function on a single column without touching other two columns and return a DataFrame with three columns.

1. Quick Examples of pandas Apply Function to a Column

If you are in a hurry, below are some of the quick examples of how to apply a function to a single and multiple columns (two or more) in pandas DataFrame.

2. pandas.DataFrame.apply() Function Syntax

If you are a learner let’s see the syntax of apply() method and executing some examples of how to apply it on a single column, multiple, and all columns. Our DataFrame contains column names A , B , and C .

Below is a syntax of pandas.DataFrame.apply()

Let’s create a sample DataFrame to work with some examples.

3. Pandas Apply Function to Single Column

We will create a function add_3() which adds value 3 column value and use this on apply() function. To apply it to a single column, qualify the column name using df[«col_name»] . The below example applies a function to a column B .

Yields below output. This applies the function to every row in DataFrame for a specified column.

4. Pandas Apply Function to All Columns

In some cases we would want to apply a function on all pandas columns, you can do this using apply() function. Here the add_3() function will be applied to all DataFrame columns.

5. Pandas Apply Function to Multiple List of Columns

Similarly using apply() method, you can apply a function on a selected multiple list of columns. In this case, the function will apply to only selected two columns without touching the rest of the columns.

6. Apply Lambda Function to Each Column

You can also apply a lambda expression using the apply() method, the Below example, adds 10 to all column values.

7. Apply Lambda Function to Single Column

You can apply the lambda function for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df[«A»]=df[«A»].apply(lambda x:x-2) .

Similarly, you can also apply the Lambda function to all & multiple columns in pandas, I will leave this to you to explore.

8. Using pandas.DataFrame.transform() to Apply Function Column

Using DataFrame.apply() method & lambda functions the resultant DataFrame can be any number of columns whereas with transform() function the resulting DataFrame must have the same length as the input DataFrame.

9. Using pandas.DataFrame.map() to Single Column

Here is another alternative using map() method.

10. DataFrame.assign() to Apply Lambda Function

11. Using Numpy function on single Column

Use df[‘A’]=df[‘A’].apply(np.square) to select the column from DataFrame as series using the [] operator and apply NumPy.square() method.

12. Using NumPy.square() Method

You can also do the same without using apply() function and directly using Numpy.

Yields same output as above.

13. Multiple columns Using NumPy.square() and Lambda Function

Apply a lambda function to multiple columns in DataFrame using Dataframe apply(), lambda, and Numpy functions.

Conclusion

In this article, you have learned how to apply a function to a single column, all and multiple columns (two or more) of pandas DataFrame using apply() , transform() and NumPy.square() , map() , transform() and assign() methods.

References

Pandas apply function to column

We make use of the Pandas dataframe to store data in an organized and tabular manner. Sometimes there, is a need to apply a function over a specific column or the whole table of the stored data.

This tutorial demonstrates the different methods available to apply a function to a column of a pandas dataframe in Python.

How do I apply function to column in pandas?

Here are multiple ways to apply function to column in Pandas.

Using dataframe.apply() function

The dataframe.apply() function is simply utilized to apply any specified function across an axis on the given pandas DataFrame .

The syntax for the dataframe.apply() function is:

The dataframe.apply() takes in a couple of parameters, all of which are mentioned below:

func: It specifies the function that needs to be applied.
axis: It specifies the axis along with which the function needs to be implemented. The value 0 denotes column while 1 denotes row . By default, its value is taken as 0 .

These two parameters are essential in order to understand the functioning and implementation of this method. Further information on the other optional parameters that the function takes in can be accessed here.

The following code uses the dataframe.apply() function to apply a function to a specific column in pandas

dfa = pd . DataFrame ( [ [ 3 , 3 , 3 ] , [ 4 , 4 , 4 ] , [ 5 , 5 , 5 ] ] , columns = [ ‘X’ , ‘Y’ , ‘Z’ ] )

The above code provides the following output:

Explanation:

The numpy and pandas libraries are imported to the code first.
A pandas DataFrame named dfa is created and is initialized.
A function that is to be implemented to the column of the DataFrame is created.
The function is then implemented to the second column Y of the given DataFrame using the apply() function.

Using lambda function along with the apply() function

A lambda function is an unnamed function that represents only a single expression while taking any number of arguments in it.

The lambda function can be tweaked into the apply() function to apply a function to a specific column. This method shortens the length of the code as compared to the method above.

The following code uses the lambda function along with the apply() function.

dfa = pd . DataFrame ( [ [ 3 , 3 , 3 ] , [ 4 , 4 , 4 ] , [ 5 , 5 , 5 ] ] , columns = [ ‘X’ , ‘Y’ , ‘Z’ ] )

The above code provides the following output:

The working of this method is similar to the simple dataframe.apply() method mentioned above, with the only difference being that we do not have to specifically create a function and can simply use a lambda function directly in the apply() function.

Using dataframe.transform() function

The dataframe.transform() function is utilized in calling a given function func on self and creating a DataFrame that contains all the transformed values, provided the length of the transformed DataFrame is the same as that of the initial value.

The syntax for the dataframe.transform() function is:

Источник

Читайте также: Php dom to json

Python apply to columns

Pandas Dataframes: Apply Examples

Apply example

Apply example, custom function

Take multiple columns as parameters

Apply function to row

Apply function to column

Return multiple columns

Apply function in parallel

Vectorization and Performance

map vs apply

Pandas apply() Function to Single & Multiple Column(s)

1. Quick Examples of pandas Apply Function to a Column

2. pandas.DataFrame.apply() Function Syntax

3. Pandas Apply Function to Single Column

4. Pandas Apply Function to All Columns

5. Pandas Apply Function to Multiple List of Columns

6. Apply Lambda Function to Each Column

7. Apply Lambda Function to Single Column

8. Using pandas.DataFrame.transform() to Apply Function Column

9. Using pandas.DataFrame.map() to Single Column

10. DataFrame.assign() to Apply Lambda Function

11. Using Numpy function on single Column

12. Using NumPy.square() Method

13. Multiple columns Using NumPy.square() and Lambda Function

Conclusion

Related Articles

References

You may also like reading:

Pandas apply function to column

How do I apply function to column in pandas?

Using dataframe.apply() function

Explanation:

Using lambda function along with the apply() function

Using dataframe.transform() function