- Pandas Dataframes: Apply Examples
- Apply example
- Apply example, custom function
- Take multiple columns as parameters
- Apply function to row
- Apply function to column
- Return multiple columns
- Apply function in parallel
- Vectorization and Performance
- map vs apply
- Pandas apply() Function to Single & Multiple Column(s)
- 1. Quick Examples of pandas Apply Function to a Column
- 2. pandas.DataFrame.apply() Function Syntax
- 3. Pandas Apply Function to Single Column
- 4. Pandas Apply Function to All Columns
- 5. Pandas Apply Function to Multiple List of Columns
- 6. Apply Lambda Function to Each Column
- 7. Apply Lambda Function to Single Column
- 8. Using pandas.DataFrame.transform() to Apply Function Column
- 9. Using pandas.DataFrame.map() to Single Column
- 10. DataFrame.assign() to Apply Lambda Function
- 11. Using Numpy function on single Column
- 12. Using NumPy.square() Method
- 13. Multiple columns Using NumPy.square() and Lambda Function
- Conclusion
- Related Articles
- References
- You may also like reading:
- Pandas apply function to column
- How do I apply function to column in pandas?
- Using dataframe.apply() function
- Using lambda function along with the apply() function
- Using dataframe.transform() function
Pandas Dataframes: Apply Examples
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
Pandas version 1.0+ used.
All code available online on this jupyter notebook
Apply example
To apply a function to a dataframe column, do df[‘my_col’].apply(function) , where the function takes one element and return another value.
import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >)[['name', 'age']] # each element of the age column is a string # so you can call .upper() on it df['name_uppercase'] = df['name'].apply(lambda element: element.upper())
BEFORE: Original dataframe
AFTER: Created new column using Series.apply()
Apply example, custom function
To apply a custom function to a column, you just need to define a function that takes one element and returns a new value:
import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >) # function that takes one value, returns one value def first_letter(input_str): return input_str[:1] # pass just the function name to apply df['first_letter'] = df['name'].apply(first_letter)
Source dataframe
Crated a new column passing a
custom function to apply
Take multiple columns as parameters
Double square brackets return another dataframe instead of a series
To apply a single function using multiple columns, select columns using double square brackets ( [[]] ) and use axis=1 :
import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david'], 'age': [25,26,27,22], >) # define a function that takes two values, returns 1 value def concatenate(value_1, value_2): return str(value_1)+ "--" + str(value_2) # note the use of DOUBLE SQUARE BRACKETS! df['concatenated'] = df[['name','age']].apply( lambda row: concatenate(row['name'], row['age']) , axis=1)
Original dataframe
Created a new column by applying a function that
takes two columns and concatenates them as strings
Apply function to row
To apply a dunction to a full row instead of a column, use axis=1 and call apply on the dataframe itself:
Example: Sum all values in each row:
import pandas as pd df = pd.DataFrame( 'value1': [1,2,3,4,5], 'value2': [5,4,3,2,1], 'value3': [10,20,30,40,50], 'value4': [99,99,99,99,np.nan], >) def sum_all(row): return np.sum(row) # note that apply was called on the dataframe itself, not on columns df['sum_all'] = df.apply(lambda row: sum_all(row) , axis=1)
Source dataframe where each row contains
observations for one sample
Generated a new column by summing all
values in the row, with numpy.sum
Apply function to column
Return multiple columns
To apply a function to a column and return multiple values so that you can create multiple columns, return a pd.Series with the values instead:
Example: produce two values from a function and assign to two columns
import pandas as pd df = pd.DataFrame( 'name': ['alice','bob','charlie','david','edward'], 'age': [25,26,27,22,np.nan], >) def times_two_times_three(value): value_times_2 = value*2 value_times_3 = value*3 return pd.Series([value_times_2,value_times_3]) # note that apply was called on age column df[['times_2','times_3']]= df['age'].apply(times_two_times_three)
Source dataframe
Modified dataframe with two new columns,
both returned by apply
Apply function in parallel
If you have costly operations you need to perform on a dataframe, (e.g. text preprocessing), you can split the operation into multiple cores to decrease the running time:
import multiprocessing import numpy as np import pandas as pd # how many cores do you have? NUM_CORES=8 # replace load_large_dataframe() with your dataframe df = load_large_dataframe() # split the dataframe into chunks, depending on hoe many cores you have df_chunks = np.array_split(df ,NUM_CORES) # this is a function that takes one dataframe chunk and returns # the processed chunk (for example, adding processed columns) def process_df(input_df): # copy the dataframe to prevent mutation in place output_df = input_df.copy() # apply a function to every row *in this chunk* output_df['new_column'] = output_df.apply(some_function, axis=1) return output_df with multiprocessing.Pool(NUM_CORES) as pool: # process each chunk in a separate core and merge the results full_output_df = pd.concat(pool.map(process_df, df_chunks), ignore_index=True)
Vectorization and Performance
map vs apply
map() | apply() |
---|---|
Series function | Series function and Dataframe function |
Returns new Series | Returns new dataframe, possibly with a single column |
Can only be applied to a single column (one element at a time) | Can be applied to multiple columns at the same time |
Operates on array elements, one at a time | Operates on whole columns or rows |
Very slow, no better than a Python for loop | Much faster when you can use numpy vectorized functions |
Pandas apply() Function to Single & Multiple Column(s)
Using pandas.DataFrame.apply() method you can execute a function to a single column, all and list of multiple columns (two or more). In this article, I will cover how to apply() a function on values of a selected single, multiple, all columns. For example, let’s say we have three columns and would like to apply a function on a single column without touching other two columns and return a DataFrame with three columns.
1. Quick Examples of pandas Apply Function to a Column
If you are in a hurry, below are some of the quick examples of how to apply a function to a single and multiple columns (two or more) in pandas DataFrame.
2. pandas.DataFrame.apply() Function Syntax
If you are a learner let’s see the syntax of apply() method and executing some examples of how to apply it on a single column, multiple, and all columns. Our DataFrame contains column names A , B , and C .
Below is a syntax of pandas.DataFrame.apply()
Let’s create a sample DataFrame to work with some examples.
3. Pandas Apply Function to Single Column
We will create a function add_3() which adds value 3 column value and use this on apply() function. To apply it to a single column, qualify the column name using df[«col_name»] . The below example applies a function to a column B .
Yields below output. This applies the function to every row in DataFrame for a specified column.
4. Pandas Apply Function to All Columns
In some cases we would want to apply a function on all pandas columns, you can do this using apply() function. Here the add_3() function will be applied to all DataFrame columns.
5. Pandas Apply Function to Multiple List of Columns
Similarly using apply() method, you can apply a function on a selected multiple list of columns. In this case, the function will apply to only selected two columns without touching the rest of the columns.
6. Apply Lambda Function to Each Column
You can also apply a lambda expression using the apply() method, the Below example, adds 10 to all column values.
7. Apply Lambda Function to Single Column
You can apply the lambda function for a single column in the DataFrame. The following example subtracts every cell value by 2 for column A – df[«A»]=df[«A»].apply(lambda x:x-2) .
Similarly, you can also apply the Lambda function to all & multiple columns in pandas, I will leave this to you to explore.
8. Using pandas.DataFrame.transform() to Apply Function Column
Using DataFrame.apply() method & lambda functions the resultant DataFrame can be any number of columns whereas with transform() function the resulting DataFrame must have the same length as the input DataFrame.
9. Using pandas.DataFrame.map() to Single Column
Here is another alternative using map() method.
10. DataFrame.assign() to Apply Lambda Function
11. Using Numpy function on single Column
Use df[‘A’]=df[‘A’].apply(np.square) to select the column from DataFrame as series using the [] operator and apply NumPy.square() method.
12. Using NumPy.square() Method
You can also do the same without using apply() function and directly using Numpy.
Yields same output as above.
13. Multiple columns Using NumPy.square() and Lambda Function
Apply a lambda function to multiple columns in DataFrame using Dataframe apply(), lambda, and Numpy functions.
Conclusion
In this article, you have learned how to apply a function to a single column, all and multiple columns (two or more) of pandas DataFrame using apply() , transform() and NumPy.square() , map() , transform() and assign() methods.
Related Articles
References
You may also like reading:
Pandas apply function to column
We make use of the Pandas dataframe to store data in an organized and tabular manner. Sometimes there, is a need to apply a function over a specific column or the whole table of the stored data.
This tutorial demonstrates the different methods available to apply a function to a column of a pandas dataframe in Python.
How do I apply function to column in pandas?
Here are multiple ways to apply function to column in Pandas.
Using dataframe.apply() function
The dataframe.apply() function is simply utilized to apply any specified function across an axis on the given pandas DataFrame .
The syntax for the dataframe.apply() function is:
The dataframe.apply() takes in a couple of parameters, all of which are mentioned below:
- func: It specifies the function that needs to be applied.
- axis: It specifies the axis along with which the function needs to be implemented. The value 0 denotes column while 1 denotes row . By default, its value is taken as 0 .
These two parameters are essential in order to understand the functioning and implementation of this method. Further information on the other optional parameters that the function takes in can be accessed here.
The following code uses the dataframe.apply() function to apply a function to a specific column in pandas
dfa = pd . DataFrame ( [ [ 3 , 3 , 3 ] , [ 4 , 4 , 4 ] , [ 5 , 5 , 5 ] ] , columns = [ ‘X’ , ‘Y’ , ‘Z’ ] )
The above code provides the following output:
Explanation:
- The numpy and pandas libraries are imported to the code first.
- A pandas DataFrame named dfa is created and is initialized.
- A function that is to be implemented to the column of the DataFrame is created.
- The function is then implemented to the second column Y of the given DataFrame using the apply() function.
Using lambda function along with the apply() function
A lambda function is an unnamed function that represents only a single expression while taking any number of arguments in it.
The lambda function can be tweaked into the apply() function to apply a function to a specific column. This method shortens the length of the code as compared to the method above.
The following code uses the lambda function along with the apply() function.
dfa = pd . DataFrame ( [ [ 3 , 3 , 3 ] , [ 4 , 4 , 4 ] , [ 5 , 5 , 5 ] ] , columns = [ ‘X’ , ‘Y’ , ‘Z’ ] )
The above code provides the following output:
The working of this method is similar to the simple dataframe.apply() method mentioned above, with the only difference being that we do not have to specifically create a function and can simply use a lambda function directly in the apply() function.
Using dataframe.transform() function
The dataframe.transform() function is utilized in calling a given function func on self and creating a DataFrame that contains all the transformed values, provided the length of the transformed DataFrame is the same as that of the initial value.
The syntax for the dataframe.transform() function is: