- Pandas Convert Column to Int in DataFrame
- 1. Quick Examples of pandas Convert Column to Int
- 2. Convert Column to int (Integer)
- 3. Convert Float to Int dtype
- 4. Casting Multiple Columns to Integer
- 5. Using apply(np.int64) to Cast to Integer
- 6. Convert Column Containing NaNs to astype(int)
- Conclusion
- Related Articles
- References
- You may also like reading:
- How to Convert Floats to Integers in Pandas DataFrame
- 4 Scenarios of Converting Floats to Integers in Pandas DataFrame
- (1) Convert floats to integers for a specific DataFrame column
- (2) Convert an entire DataFrame where the data type of all columns is float
- (3) Convert a mixed DataFrame where the data type of some columns is float
- (4) Convert a DataFrame that contains NaN values
- Additional Resources
Pandas Convert Column to Int in DataFrame
Use pandas DataFrame.astype(int) and DataFrame.apply() methods to convert a column to int (float/string to integer/int64/int32 dtype) data type. If you are converting float, I believe you would know float is bigger than int type, and converting into int would lose any value after the decimal.
Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values (anything after .). In this article, I will explain different ways to convert columns with float values to integer values.
1. Quick Examples of pandas Convert Column to Int
If you are in a hurry, below are some of the quick examples of how to convert column to integer dtype in DataFrame.
# Below are the quick examples # Convert "Fee" from String to int df = df.astype() # Convert all columns to int dtype. # This returns error in our DataFrame #df = df.astype('int') # Convert single column to int dtype. df['Fee'] = df['Fee'].astype('int') # Convert "Discount" from Float to int df = df.astype() # Converting Multiple columns to int df = pd.DataFrame(technologies) df = df.astype() # Convert "Fee" from float to int and replace NaN values df['Fee'] = df['Fee'].fillna(0).astype(int) print(df) print(df.dtypes)
Now, let’s create a DataFrame with a few rows and columns, execute some examples and validate the results. Our DataFrame contains column names Courses , Fee , Duration and Discount .
# Create DataFrame import pandas as pd import numpy as np technologies= < 'Courses':["Spark","PySpark","Hadoop","Python","Pandas"], 'Fee' :["22000","25000","23000","24000","26000"], 'Duration':['30days','50days','35days', '40days','55days'], 'Discount':[1000.10,2300.15,1000.5,1200.22,2500.20] >df = pd.DataFrame(technologies) print(df) print(df.dtypes)
Yields below output. Note that Fee column is string/object hilding integer value and Discount is float64 type.
# Output: Courses Fee Duration Discount 0 Spark 22000 30days 1000.10 1 PySpark 25000 50days 2300.15 2 Hadoop 23000 35days 1000.50 3 Python 24000 40days 1200.22 4 Pandas 26000 55days 2500.20 Courses object Fee object Duration object Discount float64 dtype: object
2. Convert Column to int (Integer)
Use pandas DataFrame.astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy.int64 , numpy.int_ , int64 or int as param. To cast to 32-bit signed integer, use numpy.int32 or int32 .
The Below example converts Fee column from string dtype to int64 . You can also use numpy.int64 as a param to this method.
# Convert "Fee" from String to int df = df.astype() print(df.dtypes)
# Output: Courses object Fee int64 Duration object Discount float64 dtype: object
If you have a DataFrame that has all string columns holiding integer values, you can convert it to int dtype simply using as below. If you have any column that has alpha-numeric values, this returns an error. If you run this on our DataFrame, you will get an error.
# Convert all columns to int dtype. df = df.astype('int')
You can also use Series.astype() to convert a specific column. since each column on DataFrame is pandas Series, I will get the column from DataFrame as Series and use astype() function. In the below example df.Fee or df[‘Fee’] returns Series object.
# Convert single column to int dtype. df['Fee'] = df['Fee'].astype('int')
3. Convert Float to Int dtype
Now by using the same approaches using astype() let’s convert the float column to int (integer) type in pandas DataFrame. Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values (anything after .).
The below example, converts column Discount holiding float values to int using DataFrame.astype() function.
# Convert "Discount" from Float to int df = df.astype() print(df.dtypes)
# Output: Courses object Fee int64 Duration object Discount int64 dtype: object
Similarly, you can also cast all columns or a single columns. Refer examples for above section for details.
4. Casting Multiple Columns to Integer
You can also convert multiple columns to integer by sending dict of column name -> data type to astype() method. The below example converts column Fee from String to int and Discount from float to int dtypes.
# Converting Multiple columns to int df = pd.DataFrame(technologies) df = df.astype() print(df.dtypes)
# Output: Courses object Fee int32 Duration object Discount int32 dtype: object
5. Using apply(np.int64) to Cast to Integer
You can also use DataFrame.apply() method to convert Fee column from string to integer in pandas. As you see in this example we are using numpy.int64 .
import numpy as np # Convert "Fee" from float to int using DataFrame.apply(np.int64) df["Fee"] = df["Fee"].apply(np.int64) print(df.dtypes)
# Output: Courses object Fee int64 Duration object Discount float64 dtype: object
6. Convert Column Containing NaNs to astype(int)
In order to demonstrate some NaN/Null values, let’s create a DataFrame using NaN Values. To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert.
import pandas as pd import numpy as np technologies= < 'Fee' :[22000.30,25000.40,np.nan,24000.50,26000.10,np.nan] >df = pd.DataFrame(technologies) print(df) print(df.dtypes)
Use DataFrame.fillna() to replace the NaN values with integer value zero.
# Convert "Fee" from float to int and replace NaN values df['Fee'] = df['Fee'].fillna(0).astype(int) print(df) print(df.dtypes)
# Output: Fee 0 22000 1 25000 2 0 3 24000 4 26000 5 0 Fee int32 dtype: object
Conclusion
In this article, you have learned how to convert column string to int, float to to int using DataFrame.astype() and DataFrame.apply() method. Also, you have learned how to convert float and string to integers when you have Nan/null values in a column.
Related Articles
References
You may also like reading:
How to Convert Floats to Integers in Pandas DataFrame
You can convert floats to integers in Pandas DataFrame using:
(1) astype(int):
df['DataFrame Column'] = df['DataFrame Column'].astype(int)
df['DataFrame Column'] = df['DataFrame Column'].apply(int)
In this guide, you’ll see 4 scenarios of converting floats to integers for:
- Specific DataFrame column using astype(int) or apply(int)
- Entire DataFrame where the data type of all columns is float
- Mixed DataFrame where the data type of some columns is float
- DataFrame that contains NaN values
4 Scenarios of Converting Floats to Integers in Pandas DataFrame
(1) Convert floats to integers for a specific DataFrame column
To start with a simple example, let’s create a DataFrame with two columns, where:
- The first column (called ‘numeric_values‘) will contain only floats
- The second column (called ‘string_values‘) will contain only strings
The goal is to convert all the floats to integers under the first DataFrame column.
Here is the code to create the DataFrame:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values','string_values']) print(df) print(df.dtypes)
As you can see, the data type of the ‘numeric_values’ column is float:
numeric_values string_values 0 3.000 AA 1 5.000 BB 2 7.000 CCC 3 15.995 DD 4 225.120 EEEE numeric_values float64 string_values object dtype: object
You can then use astype(int) in order to convert the floats to integers:
df['DataFrame Column'] = df['DataFrame Column'].astype(int)
So the complete code to perform the conversion is as follows:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values','string_values']) df['numeric_values'] = df['numeric_values'].astype(int) print(df) print(df.dtypes)
You’ll now notice that the data type of the ‘numeric_values’ column is integer:
numeric_values string_values 0 3 AA 1 5 BB 2 7 CCC 3 15 DD 4 225 EEEE numeric_values int32 string_values object dtype: object
Alternatively, you can use apply(int) to convert the floats to integers:
df['DataFrame Column'] = df['DataFrame Column'].apply(int)
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values','string_values']) df['numeric_values'] = df['numeric_values'].apply(int) print(df) print(df.dtypes)
You’ll get the data type of integer:
numeric_values string_values 0 3 AA 1 5 BB 2 7 CCC 3 15 DD 4 225 EEEE numeric_values int64 string_values object dtype: object
(2) Convert an entire DataFrame where the data type of all columns is float
What if you have a DataFrame where the data type of all the columns is float?
Rather than specifying the conversion to integers column-by-column, you can do it instead on a DataFrame level using:
For example, let’s create a new DataFrame with two columns that contain only floats:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values_1','numeric_values_2']) print(df) print(df.dtypes)
You’ll now get this DataFrame with the two float columns:
numeric_values_1 numeric_values_2 0 3.200 7.7 1 5.900 23.0 2 7.000 522.0 3 15.995 4275.5 4 225.120 22.3 numeric_values_1 float64 numeric_values_2 float64 dtype: object
To convert the floats to integers throughout the entire DataFrame, you’ll need to add df = df.astype(int) to the code:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values_1','numeric_values_2']) df = df.astype(int) print(df) print(df.dtypes)
As you can see, all the columns in the DataFrame are now converted to integers:
numeric_values_1 numeric_values_2 0 3 7 1 5 23 2 7 522 3 15 4275 4 225 22 numeric_values_1 int32 numeric_values_2 int32 dtype: object
Note that the above approach would only work if all the columns in the DataFrame have the data type of float.
What if you have a mixed DataFrame where the data type of some (but not all) columns is float?
The section below deals with this scenario.
(3) Convert a mixed DataFrame where the data type of some columns is float
Let’s now create a new DataFrame with 3 columns, where the first 2 columns will contain float values, while the third column will include only strings:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values_1','numeric_values_2','string_values']) print(df) print(df.dtypes)
Here is the DataFrame with the 3 columns that you’ll get:
numeric_values_1 numeric_values_2 string_values 0 3.200 7.7 AA 1 5.900 23.0 BB 2 7.000 522.0 CCC 3 15.995 4275.5 DD 4 225.120 22.3 EEEE numeric_values_1 float64 numeric_values_2 float64 string_values object dtype: object
You can then specify multiple columns (in this example, the first two columns) that you’d like to convert to integers:
import pandas as pd data = df = pd.DataFrame(data,columns=['numeric_values_1','numeric_values_2','string_values']) df[['numeric_values_1','numeric_values_2']] = df[['numeric_values_1','numeric_values_2']].astype(int) print(df) print(df.dtypes)
As you may observe, the first 2 columns are now converted to integers:
numeric_values_1 numeric_values_2 string_values 0 3 7 AA 1 5 23 BB 2 7 522 CCC 3 15 4275 DD 4 225 22 EEEE numeric_values_1 int32 numeric_values_2 int32 string_values object dtype: object
(4) Convert a DataFrame that contains NaN values
In the final scenario, you’ll see how to convert a column that includes a mixture of floats and NaN values.
The goal is to convert the float values to integers, as well as replace the NaN values with zeros.
Here is the code to create the DataFrame:
import pandas as pd import numpy as np data = df = pd.DataFrame(data,columns=['numeric_values']) print(df) print(df.dtypes)
You’ll get this DataFrame that contains both floats and NaNs:
numeric_values 0 3.0 1 5.0 2 NaN 3 15.0 4 NaN numeric_values float64 dtype: object
You can then replace the NaN values with zeros by adding fillna(0), and then perform the conversion to integers using astype(int):
import pandas as pd import numpy as np data = df = pd.DataFrame(data,columns=['numeric_values']) df['numeric_values'] = df['numeric_values'].fillna(0).astype(int) print(df) print(df.dtypes)
Here is the newly converted DataFrame:
numeric_values 0 3 1 5 2 0 3 15 4 0 numeric_values int32 dtype: object
Additional Resources
You can check the Pandas Documentation to read more about astype.
Alternatively, you may review the following guides for other types of conversions: