Python dataframe column types

How to get & check data types of Dataframe columns in Python Pandas

In this article we will discuss different ways to fetch the data type of single or multiple columns. Also see how to compare data types of columns and fetch column names based on data types.

Use Dataframe.dtypes to get Data types of columns in Dataframe

In Python’s pandas module Dataframe class provides an attribute to get the data type information of each columns i.e.

It returns a series object containing data type information of each column. Let’s use this to find & check data types of columns.

Suppose we have a Dataframe i.e.

Frequently Asked:

# List of Tuples empoyees = [('jack', 34, 'Sydney', 155), ('Riti', 31, 'Delhi', 177.5), ('Aadi', 16, 'Mumbai', 81), ('Mohit', 31, 'Delhi', 167), ('Veena', 12, 'Delhi', 144), ('Shaunak', 35, 'Mumbai', 135), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print(empDfObj)

Contents of the dataframe are,

Name Age City Marks 0 jack 34 Sydney 155.0 1 Riti 31 Delhi 177.5 2 Aadi 16 Mumbai 81.0 3 Mohit 31 Delhi 167.0 4 Veena 12 Delhi 144.0 5 Shaunak 35 Mumbai 135.0 6 Shaun 35 Colombo 111.0

Let’s fetch the Data type of each column in Dataframe as a Series object,

# Get a Series object containing the data type objects of each column of Dataframe. # Index of series is column name. dataTypeSeries = empDfObj.dtypes print('Data type of each column of Dataframe :') print(dataTypeSeries)
Data type of each column of Dataframe : Name object Age int64 City object Marks float64 dtype: object

Index of returned Series object is column name and value column of Series contains the data type of respective column.

Читайте также:  Велико viewtopic php t

Get Data types of Dataframe columns as dictionary

We can convert the Series object returned by Dataframe.dtypes to a dictionary too,

# Get a Dictionary containing the pairs of column names & data type objects. dataTypeDict = dict(empDfObj.dtypes) print('Data type of each column of Dataframe :') print(dataTypeDict)

Data type of each column of Dataframe :

Get the Data type of a single column in Dataframe

We can also fetch the data type of a single column from series object returned by Dataframe.dtypes i.e.

# get data type of column 'Age' dataTypeObj = empDfObj.dtypes['Age'] print('Data type of each column Age in the Dataframe :') print(dataTypeObj)
Data type of each column Age in the Dataframe : int64

Check if data type of a column is int64 or object etc.

Using Dataframe.dtypes we can fetch the data type of a single column and can check its data type too i.e.

Check if Data type of a column is int64 in Dataframe

# Check the type of column 'Age' is int64 if dataTypeObj == np.int64: print("Data type of column 'Age' is int64")
Data type of column 'Age' is int64

Check if Data type of a column is object i.e. string in Dataframe

# Check the type of column 'Name' is object i.e string if empDfObj.dtypes['Name'] == np.object: print("Data type of column 'Name' is object")
Data type of column 'Name' is object

Get list of pandas dataframe column names based on data type

Suppose we want a list of column names whose data type is np.object i.e string. Let’s see how to do that,

# Get columns whose data type is object i.e. string filteredColumns = empDfObj.dtypes[empDfObj.dtypes == np.object] # list of columns whose data type is object i.e. string listOfColumnNames = list(filteredColumns.index) print(listOfColumnNames)

We basically filtered the series returned by Dataframe.dtypes by value and then fetched index names i.e. columns names from this filtered series.

Get data types of a dataframe using Dataframe.info()

Dataframe.info() prints a detailed summary of the dataframe. It includes information like

  • Name of columns
  • Data type of columns
  • Rows in dataframe
  • non null entries in each column
# Print complete details about the data frame, it will also print column count, names and data types. empDfObj.info()
 RangeIndex: 7 entries, 0 to 6 Data columns (total 4 columns): Name 7 non-null object Age 7 non-null int64 City 7 non-null object Marks 7 non-null float64 dtypes: float64(1), int64(1), object(2) memory usage: 208.0+ bytes

It also gives us detail about data types of columns in our dataframe.

Complete example is as follows,

import pandas as pd import numpy as np def main(): # List of Tuples empoyees = [('jack', 34, 'Sydney', 155), ('Riti', 31, 'Delhi', 177.5), ('Aadi', 16, 'Mumbai', 81), ('Mohit', 31, 'Delhi', 167), ('Veena', 12, 'Delhi', 144), ('Shaunak', 35, 'Mumbai', 135), ('Shaun', 35, 'Colombo', 111) ] # Create a DataFrame object empDfObj = pd.DataFrame(empoyees, columns=['Name', 'Age', 'City', 'Marks']) print("Contents of the Dataframe : ") print(empDfObj) print('*** Get the Data type of each column in Dataframe ***') # Get a Series object containing the data type objects of each column of Dataframe. # Index of series is column name. dataTypeSeries = empDfObj.dtypes print('Data type of each column of Dataframe :') print(dataTypeSeries) # Get a Dictionary containing the pairs of column names & data type objects. dataTypeDict = dict(empDfObj.dtypes) print('Data type of each column of Dataframe :') print(dataTypeDict) print('*** Get the Data type of a single column in Dataframe ***') # get data type of column 'Age' dataTypeObj = empDfObj.dtypes['Age'] print('Data type of each column Age in the Dataframe :') print(dataTypeObj) print('*** Check if Data type of a column is int64 or object etc in Dataframe ***') # Check the type of column 'Age' is int64 if dataTypeObj == np.int64: print("Data type of column 'Age' is int64") # Check the type of column 'Name' is object i.e string if empDfObj.dtypes['Name'] == np.object: print("Data type of column 'Name' is object") print('** Get list of pandas dataframe columns based on data type **') # Get columns whose data type is object i.e. string filteredColumns = empDfObj.dtypes[empDfObj.dtypes == np.object] # list of columns whose data type is object i.e. string listOfColumnNames = list(filteredColumns.index) print(listOfColumnNames) print('*** Get the Data type of each column in Dataframe using info() ***') # Print complete details about the data frame, it will also print column count, names and data types. empDfObj.info() if __name__ == '__main__': main()
Contents of the Dataframe : Name Age City Marks 0 jack 34 Sydney 155.0 1 Riti 31 Delhi 177.5 2 Aadi 16 Mumbai 81.0 3 Mohit 31 Delhi 167.0 4 Veena 12 Delhi 144.0 5 Shaunak 35 Mumbai 135.0 6 Shaun 35 Colombo 111.0 *** Get the Data type of each column in Dataframe *** Data type of each column of Dataframe : Name object Age int64 City object Marks float64 dtype: object Data type of each column of Dataframe : *** Get the Data type of a single column in Dataframe *** Data type of each column Age in the Dataframe : int64 *** Check if Data type of a column is int64 or object etc in Dataframe *** Data type of column 'Age' is int64 Data type of column 'Name' is object ** Get list of pandas dataframe columns based on data type ** ['Name', 'City'] *** Get the Data type of each column in Dataframe using info() *** RangeIndex: 7 entries, 0 to 6 Data columns (total 4 columns): Name 7 non-null object Age 7 non-null int64 City 7 non-null object Marks 7 non-null float64 dtypes: float64(1), int64(1), object(2) memory usage: 208.0+ bytes

Источник

Assign pandas dataframe column dtypes

This would perhaps be a good bug / feature request, currently I’m not sure what dtype arg is doing (you can pass it a scalar, but it’s not strict).

7 Answers 7

Since 0.17, you have to use the explicit conversions:

pd.to_datetime, pd.to_timedelta and pd.to_numeric 

(As mentioned below, no more «magic», convert_objects has been deprecated in 0.17)

df = pd.DataFrame(, 'y': , 'z': >) df.dtypes x object y object z object dtype: object df x y z 0 a 1 2018-05-01 1 b 2 2018-05-02 

You can apply these to each column you want to convert:

df["y"] = pd.to_numeric(df["y"]) df["z"] = pd.to_datetime(df["z"]) df x y z 0 a 1 2018-05-01 1 b 2 2018-05-02 df.dtypes x object y int64 z datetime64[ns] dtype: object 

and confirm the dtype is updated.

OLD/DEPRECATED ANSWER for pandas 0.12 — 0.16: You can use convert_objects to infer better dtypes:

In [21]: df Out[21]: x y 0 a 1 1 b 2 In [22]: df.dtypes Out[22]: x object y object dtype: object In [23]: df.convert_objects(convert_numeric=True) Out[23]: x y 0 a 1 1 b 2 In [24]: df.convert_objects(convert_numeric=True).dtypes Out[24]: x object y int64 dtype: object 

Magic! (Sad to see it deprecated.)

like type.convert in R a little bit; nice but does leave one wishing for explicit specifications in some cases.

Be careful if you have a column that needs to be a string but contains at least one value that could be converted to an int. All it takes is one value and the entire field is converted to float64

@smci okay, I’ve edited. There’s a bunch of deprecated answers, I need to work out a way to find them all.

you can set the types explicitly with pandas DataFrame.astype(dtype, copy=True, raise_on_error=True, **kwargs) and pass in a dictionary with the dtypes you want to dtype

import pandas as pd wheel_number = 5 car_name = 'jeep' minutes_spent = 4.5 # set the columns data_columns = ['wheel_number', 'car_name', 'minutes_spent'] # create an empty dataframe data_df = pd.DataFrame(columns = data_columns) df_temp = pd.DataFrame([[wheel_number, car_name, minutes_spent]],columns = data_columns) data_df = data_df.append(df_temp, ignore_index=True) 
In [11]: data_df.dtypes Out[11]: wheel_number float64 car_name object minutes_spent float64 dtype: object 
data_df = data_df.astype(dtype= ) 

now you can see that it’s changed

In [18]: data_df.dtypes Out[18]: wheel_number int64 car_name object minutes_spent float64 

That’s the best way to pass the entire dictionary defined by the «dtypes» of another dataframe to the new one. Thanks!

For those coming from Google (etc.) such as myself:

convert_objects has been deprecated since 0.17 — if you use it, you get a warning like this one:

FutureWarning: convert_objects is deprecated. Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric. 

You should do something like the following:

If you threw in some examples of pd.to_datetime, to_timedelta, to_numeric this should be the accepted answer.

Another way to set the column types is to first construct a numpy record array with your desired types, fill it out and then pass it to a DataFrame constructor.

import pandas as pd import numpy as np x = np.empty((10,), dtype=[('x', np.uint8), ('y', np.float64)]) df = pd.DataFrame(x) df.dtypes -> x uint8 y float64 

You’re better off using typed np.arrays, and then pass the data and column names as a dictionary.

import numpy as np import pandas as pd # Feature: np arrays are 1: efficient, 2: can be pre-sized x = np.array(['a', 'b'], dtype=object) y = np.array([ 1 , 2 ], dtype=np.int32) df = pd.DataFrame( < 'x' : x, # Feature: column name is near data array 'y' : y, >) 
import pandas as pd df = pd.DataFrame([['a', '1'], ['b', '2']], columns=['x', 'y']) # Cast a pandas object to a specified dtype df = df.astype() # Check print(df.dtypes) 

Your answer only contains code. I recommend that you don’t post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes.

facing similar problem to you. In my case I have 1000’s of files from cisco logs that I need to parse manually.

In order to be flexible with fields and types I have successfully tested using StringIO + read_cvs which indeed does accept a dict for the dtype specification.

I usually get each of the files ( 5k-20k lines) into a buffer and create the dtype dictionaries dynamically.

Eventually I concatenate ( with categorical. thanks to 0.19) these dataframes into a large data frame that I dump into hdf5.

Something along these lines

import pandas as pd import io output = io.StringIO() output.write('A,1,20,31\n') output.write('B,2,21,32\n') output.write('C,3,22,33\n') output.write('D,4,23,34\n') output.seek(0) df=pd.read_csv(output, header=None, names=["A","B","C","D"], dtype=, sep="," ) df.info() RangeIndex: 5 entries, 0 to 4 Data columns (total 4 columns): A 5 non-null category B 5 non-null float32 C 5 non-null int32 D 5 non-null float64 dtypes: category(1), float32(1), float64(1), int32(1) memory usage: 205.0 bytes None 

Not very pythonic. but does the job

Источник

Оцените статью