- How to Add Column to Pandas DataFrame?
- Syntax to add column
- Examples
- 1. Add column to DataFrame
- 2. Add column to DataFrame with a default value
- Summary
- pandas.DataFrame.assign#
- Add Column to DataFrame Pandas (with Examples)
- What is Pandas in Python?
- What is a DataFrame?
- Using Pandas, what can you do with DataFrames?
- How to Add Column to Pandas DataFrame?
- 1) Using the simple assignment
- 2) Using assign() method
- 3) Using insert() method
- 4) Using concat() method
- Conclusion
How to Add Column to Pandas DataFrame?
To add a new column to the existing Pandas DataFrame, assign the new column values to the DataFrame, indexed using the new column name.
In this tutorial, we shall learn how to add a column to DataFrame, with the help of example programs, that are going to be very detailed and illustrative.
Syntax to add column
The syntax to add a column to DataFrame is:
mydataframe['new_column_name'] = column_values
where mydataframe is the dataframe to which you would like to add the new column with the label new_column_name. You can either provide all the column values as a list or a single value that is taken as default value for all of the rows.
Examples
1. Add column to DataFrame
In this example, we will create a dataframe df_marks and add a new column with name geometry.
Python Program
import pandas as pd mydictionary = #create dataframe df_marks = pd.DataFrame(mydictionary) print('Original DataFrame\n--------------') print(df_marks) #add column df_marks['geometry'] = [81, 92, 67, 76] print('\n\nDataFrame after adding "geometry" column\n--------------') print(df_marks)
Original DataFrame -------------- names physics chemistry algebra 0 Somu 68 84 78 1 Kiku 74 56 88 2 Amol 77 73 82 3 Lini 78 69 87 DataFrame after adding "geometry" column -------------- names physics chemistry algebra geometry 0 Somu 68 84 78 81 1 Kiku 74 56 88 92 2 Amol 77 73 82 67 3 Lini 78 69 87 76
The column is added to the dataframe with the specified list as column values.
The length of the list you provide for the new column should equal the number of rows in the dataframe. If this condition fails, you will get an error similar to the following.
ValueError: Length of values does not match length of index
2. Add column to DataFrame with a default value
In this example, we will create a dataframe df_marks and add a new column called geometry with a default value for each of the rows in the dataframe.
Python Program
import pandas as pd mydictionary = #create dataframe df_marks = pd.DataFrame(mydictionary) print('Original DataFrame\n--------------') print(df_marks) #add column df_marks['geometry'] = 65 print('\n\nDataFrame after adding "geometry" column\n--------------') print(df_marks)
Original DataFrame -------------- names physics chemistry algebra 0 Somu 68 84 78 1 Kiku 74 56 88 2 Amol 77 73 82 3 Lini 78 69 87 DataFrame after adding "geometry" column -------------- names physics chemistry algebra geometry 0 Somu 68 84 78 65 1 Kiku 74 56 88 65 2 Amol 77 73 82 65 3 Lini 78 69 87 65
The column is added to the dataframe with the specified value as default column value.
Summary
In this Pandas Tutorial, we learned how to add a new column to Pandas DataFrame with the help of detailed Python examples.
pandas.DataFrame.assign#
Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.
Parameters **kwargs dict of
The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
A new DataFrame with the new columns in addition to all the existing columns.
Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.
>>> df = pd.DataFrame('temp_c': [17.0, 25.0]>, . index=['Portland', 'Berkeley']) >>> df temp_c Portland 17.0 Berkeley 25.0
Where the value is a callable, evaluated on df :
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0
Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0
You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, . temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) temp_c temp_f temp_k Portland 17.0 62.6 290.15 Berkeley 25.0 77.0 298.15
Add Column to DataFrame Pandas (with Examples)
There are many things we can do with the DataFrame we have built or imported in Pandas. It is possible to manipulate data in various ways, such as changing the data frame columns. Now, if we’re reading most of the data from one data source but some from another, we’ll need to know how to add columns to a Pandas DataFrame . Well, it’s pretty simple. As you have already noticed, there are a few different approaches to complete this work. Of course, this can be perplexing for newcomers. As a beginner, you may see numerous alternative methods for adding a column to a data frame and wonder which one to use. Don’t worry; in this article, we’ll go over four different ways to do the same. So, let’s get started!
What is Pandas in Python?
Pandas is a widely used open-source Python library for data science or data analysis and machine learning tasks. It has a lot of functions and methods for dealing with tabular data. Pandas’ main data structure is a data frame, which is a tabular data structure with labeled rows and columns. If you are a beginner in python then you can try these 20 pandas exercises.
Now, let us dive deep into learning Pandas DataFrames below:
What is a DataFrame?
A DataFrame represents a table of data with rows and columns and is the most common Structured API. Rows in a DataFrame indicate observations or data points. The properties or attributes of the observations are represented by columns. Consider a set of property pricing data. Each row represents a house, and each column represents a characteristic of the house, such as its age, number of rooms, price, etc.
Using Pandas, what can you do with DataFrames?
Many of the time-consuming, repetitive processes connected to working with data are made simple with Pandas. Following are a few of the tasks that you can efficiently perform with Pandas DataFrame:
- Data Inspection
- Data Cleansing
- Data Normalization
- Data Visualization
- Statistical Analysis
First, let’s create an example DataFrame that we’ll use to explain a few ideas related to adding columns to pandas frames throughout this article.
For example:
import pandas as pd # importing pandas library df = pd.DataFrame(< 'colA':[True, False, False], 'colB': [1, 2, 3], >) # creating the DataFrame print(df)
colA colB 0 True 1 1 False 2 2 False 3
Suppose we need to add a new column named ‘colC’ containing the values ‘a’, ‘b’, and ‘c’ for the indices 0, 1, and 2, respectively. How will we do it? Let’s see!
How to Add Column to Pandas DataFrame?
Below are the four methods by which Pandas add column to DataFrame. In our case, we’ll add ‘colC‘ to our sample DataFrame mentioned earlier in the article:
1) Using the simple assignment
You can add a new column to Dataframe by simply giving your Series’s data to the existing frame. It is one of the easiest and efficient methods widely used by python programmers. Note that the name of the new column should be enclosed with single quotes inside the square brackets, as shown in the below example.
For example:
colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c
Note that in most circumstances, the above will work if the new column’s indices match those of the DataFrame; or else, NaN values will be given to missing indices.
For example:
df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3]) print(df)
colA colB colC 0 True 1 NaN 1 False 2 a 2 False 3 b
2) Using assign() method
Using the pandas.DataFrame.assign() method, you can insert multiple columns in a DataFrame, ignoring the index of a column to be added, or modify the values of existing columns. The method returns a new DataFrame object with all of the original columns as well as the additional(newly added) ones. Note that the index of the new columns will be ignored as well as, all the current columns will be overwritten if they are re-assigned.
For example:
e = pd.Series([1.0, 3.0, 2.0], index=[0, 2, 1]) s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2]) df.assign(colC=s.values, colB=e.values)
colA colB colC 0 True 1.0 a 1 False 3.0 b 2 False 2.0 c
3) Using insert() method
Apart from the above two methods, you can also use the method pandas.DataFrame.insert() for adding columns to DataFrame . This method comes in handy when you need to add a column at a specific position or index. Remember that here we make use of the ‘len’ method to identify the length of the columns for existing DataFrames. The below example adds another column named ’colC’ at the end of the DataFrame.
For example:
df.insert(len(df.columns), 'colC', s.values) print(df)
colA colB colC
0 True 1 a 1 False 2 b 2 False 3 c
Now, if you want to add a column ’colC’ in between two columns — ‘colA’ and ‘colB’.
For example:
df.insert(1, 'colC', s.values) print(df)
colA colC colB 0 True a 1 1 False b 2 2 False c 3
Note that the insert() method cannot be used to add the column with a similar name. By default, a ValueError will be thrown when a column already exists in the DataFrame.
For example:
df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values)
ValueError: cannot insert colC, already exists
Nevertheless, the DataFrame will allow having two columns with the same name if you pass the command allow_duplicates=True to the insert() method.
For example:
df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values, allow_duplicates=True) print(df)
colA colC colC colB 0 True a a 1 1 False b b 2 2 False c c 3
4) Using concat() method
The pandas.concat() method can also be used to add a column to the existing DataFrame by passing axis=1 . This method will return the new DataFrame as the output, including the newly added column. Using the index, the above method will concatenate the Series with the original DataFrame. Check out the below example for a better understanding.
For example:
df = pd.concat([df, s.rename('colC')], axis=1) print(df)
colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c
Commonly you should use the above method if the indices of the objects to be added do match with each other. If the index doesn’t match, every object’s indices will be present in the resulting DataFrame, and the columns will represent NaN, as shown in the below example.
For example:
s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30]) df = pd.concat([df, s.rename('colC')], axis=1) print(df)
colA colB colC 0 True 1.0 NaN 1 False 2.0 NaN 2 False 3.0 NaN 10 NaN NaN a 20 NaN NaN b 30 NaN NaN c
Conclusion
Adding columns to DataFrame is a commonly used data analysis and modification operation. However, Pandas provide numerous options for completing a task by giving four distinct methods, as shown in the above article. The index is one of the most challenging aspects of adding new columns to DataFrames. You should be cautious because each of the methods covered in this article may handle indices differently. However, if you have learned all the above methods perfectly, you are good to go for adding new columns to your DataFrames.