- Create Pandas DataFrame from a Numpy Array
- Using the pandas.DataFrame() function
- Examples
- 1. 2D numpy array to a pandas dataframe
- 2. 1D numpy array to a pandas dataframe
- Additional Note
- Author
- Как преобразовать массив NumPy в Pandas DataFrame
- Пример: преобразование массива NumPy в Pandas DataFrame
- Укажите имена строк и столбцов для Pandas DataFrame
- How to Convert NumPy Array to Pandas DataFrame
- Steps to Convert a NumPy Array to Pandas DataFrame
- Step 1: Create a NumPy Array
- Step 2: Convert the NumPy Array to Pandas DataFrame
- Step 3 (optional): Add an Index to the DataFrame
- Array Contains a Mix of Strings and Numeric Data
Create Pandas DataFrame from a Numpy Array
Pandas dataframes are quite versatile when it comes to manipulating 2D tabular data in python. And often it can be quite useful to convert a numpy array to a pandas dataframe for manipulating or transforming data. In this tutorial, we’ll look at how to create a pandas dataframe from a numpy array.
Using the pandas.DataFrame() function
To create a pandas dataframe from a numpy array, pass the numpy array as an argument to the pandas.DataFrame() function. You can also pass the index and column labels for the dataframe. The following is the syntax:
📚 Discover Online Data Science Courses & Programs (Enroll for Free)
Introductory ⭐
Intermediate ⭐⭐⭐
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
df = pandas.DataFrame(data=arr, index=None, columns=None)
Examples
Let’s look at a few examples to better understand the usage of the pandas.DataFrame() function for creating dataframes from numpy arrays.
1. 2D numpy array to a pandas dataframe
Let’s create a dataframe by passing a numpy array to the pandas.DataFrame() function and keeping other parameters as default.
import numpy as np import pandas as pd # sample numpy array arr = np.array([[70, 90, 80], [68, 80, 93]]) # convert to pandas dataframe with default parameters df = pd.DataFrame(arr) # print print("Numpy array:\n", arr) print("\nPandas dataframe:\n", df)
Numpy array: [[70 90 80] [68 80 93]] Pandas dataframe: 0 1 2 0 70 90 80 1 68 80 93
In the above example, the dataframe df is created from the numpy array arr . Note that since we did not pass the index and column labels, the created dataframe used the default RangeIndex for them.
Let’s pass custom index and column labels to the dataframe being created.
import numpy as np import pandas as pd # sample numpy array arr = np.array([[70, 90, 80], [68, 80, 93]]) # convert to pandas dataframe with custom index and column names df = pd.DataFrame(arr, columns=['History', 'Physics', 'Math'], index=['Sam', 'Emma']) # print print("Numpy array:\n", arr) print("\nPandas dataframe:\n", df)
Numpy array: [[70 90 80] [68 80 93]] Pandas dataframe: History Physics Math Sam 70 90 80 Emma 68 80 93
Here, the index labels and column names are passed to the arguments index and columns respectively. From the labels, we can assume that the dataframe stores the test scores of students Sam and Emma in the subjects History , Physics and Math .
2. 1D numpy array to a pandas dataframe
Passing a one-dimensional numpy array to the pandas.DataFrame() function will result in a pandas dataframe with one column.
import numpy as np import pandas as pd # sample numpy array arr = np.array([10, 20, 30, 40]) # convert to pandas dataframe df = pd.DataFrame(arr) # print print("Numpy array:\n", arr) print("\nPandas dataframe:\n", df)
Numpy array: [10 20 30 40] Pandas dataframe: 0 0 10 1 20 2 30 3 40
Fore more on the pandas.DataFrame() function, refer to its official documentation.
Additional Note
Pandas dataframes are objects used to store two-dimensional tabular data. If you try to create a pandas dataframe from a numpy array with more than 2 dimensions, you’ll get an error. See the example below.
import numpy as np import pandas as pd # sample numpy array arr = np.random.randint(1,5,(3,3,2)) print("Numpy array:\n", arr) # convert to pandas dataframe df = pd.DataFrame(arr) print("\nPandas dataframe:\n", df)
Numpy array: [[[4 4] [2 2] [2 2]] [[2 2] [4 4] [1 3]] [[3 4] [4 1] [2 2]]] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 6 print("Numpy array:\n", arr) 7 # convert to pandas dataframe ----> 8 df = pd.DataFrame(arr) 9 print("\nPandas dataframe:\n", df) ~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in prep_ndarray(values, copy) 293 values = values.reshape((values.shape[0], 1)) 294 elif values.ndim != 2: --> 295 raise ValueError("Must pass 2-d input") 296 297 return values ValueError: Must pass 2-d input
* Some lines in the above error message have been skipped to shorten the output shown.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having numpy version 1.18.5 and pandas version 1.0.5
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
- How to sort a Numpy Array?
- Create Pandas DataFrame from a Numpy Array
- Different ways to Create NumPy Arrays
- Convert Numpy array to a List – With Examples
- Append Values to a Numpy Array
- Find Index of Element in Numpy Array
- Read CSV file as NumPy Array
- Filter a Numpy Array – With Examples
- Python – Randomly select value from a list
- Numpy – Sum of Values in Array
- Numpy – Elementwise sum of two arrays
- Numpy – Elementwise multiplication of two arrays
- Using the numpy linspace() method
- Using numpy vstack() to vertically stack arrays
- Numpy logspace() – Usage and Examples
- Using the numpy arange() method
- Using numpy hstack() to horizontally stack arrays
- Trim zeros from a numpy array in Python
- Get unique values and counts in a numpy array
- Horizontally split numpy array with hsplit()
Author
Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects. View all posts
Data Science Parichay is an educational website offering easy-to-understand tutorials on topics in Data Science with the help of clear and fun examples.
Как преобразовать массив NumPy в Pandas DataFrame
Вы можете использовать следующий синтаксис для преобразования массива NumPy в кадр данных pandas:
#create NumPy array data = np.array([[1, 7, 6, 5, 6], [4, 4, 4, 3, 1]]) #convert NumPy array to pandas DataFrame df = pd.DataFrame(data=data)
В следующем примере показано, как использовать этот синтаксис на практике.
Пример: преобразование массива NumPy в Pandas DataFrame
Предположим, у нас есть следующий массив NumPy:
import numpy as np #create NumPy array data = np.array([[1, 7, 6, 5, 6], [4, 4, 4, 3, 1]]) #print class of NumPy array type (data) numpy.ndarray
Мы можем использовать следующий синтаксис для преобразования массива NumPy в кадр данных pandas:
import pandas as pd #convert NumPy array to pandas DataFrame df = pd.DataFrame(data=data) #print DataFrame print(df) 0 1 2 3 4 0 1 7 6 5 6 1 4 4 4 3 1 #print class of DataFrame type(df) pandas.core.frame.DataFrame
Укажите имена строк и столбцов для Pandas DataFrame
Мы также можем указать имена строк и имена столбцов для DataFrame, используя аргументы индекса и столбцов соответственно.
#convert array to DataFrame and specify rows & columns df = pd.DataFrame(data=data, index=["r1", "r2"], columns=["A", "B", "C", "D", "E"]) #print the DataFrame print(df) A B C D E r1 1 7 6 5 6 r2 4 4 4 3 1
How to Convert NumPy Array to Pandas DataFrame
In this short guide, you’ll see how to convert a NumPy array to Pandas DataFrame.
Here are the complete steps.
Steps to Convert a NumPy Array to Pandas DataFrame
Step 1: Create a NumPy Array
For example, let’s create the following NumPy array that contains only numeric data (i.e., integers):
import numpy as np my_array = np.array([[11,22,33],[44,55,66]]) print(my_array) print(type(my_array))
Run the code in Python, and you’ll get the following NumPy array:
Step 2: Convert the NumPy Array to Pandas DataFrame
You can now convert the NumPy array to Pandas DataFrame using the following syntax:
import numpy as np import pandas as pd my_array = np.array([[11,22,33],[44,55,66]]) df = pd.DataFrame(my_array, columns = ['Column_A','Column_B','Column_C']) print(df) print(type(df))
You’ll now get a DataFrame with 3 columns:
Column_A Column_B Column_C 0 11 22 33 1 44 55 66
Step 3 (optional): Add an Index to the DataFrame
What if you’d like to add an index to the DataFrame?
For instance, let’s add the following index to the DataFrame:
So here is the complete code to convert the array to a DataFrame with an index:
import numpy as np import pandas as pd my_array = np.array([[11,22,33],[44,55,66]]) df = pd.DataFrame(my_array, columns = ['Column_A','Column_B','Column_C'], index = ['Item_1', 'Item_2']) print(df) print(type(df))
You’ll now see the index on the left side of the DataFrame:
Column_A Column_B Column_C Item_1 11 22 33 Item_2 44 55 66
Array Contains a Mix of Strings and Numeric Data
Let’s now create a new NumPy array that will contain a mixture of strings and numeric data (where the dtype for this array will be set to object):
import numpy as np my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object) print(my_array) print(type(my_array)) print(my_array.dtype)
Here is the new array with an object dtype:
[['Jon' 25 1995 2016] ['Maria' 47 1973 2000] ['Bill' 38 1982 2005]] object
You can then use the following syntax to convert the NumPy array to a DataFrame:
import numpy as np import pandas as pd my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object) df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year']) print(df) print(type(df))
Here is the new DataFrame:
Name Age Birth Year Graduation Year 0 Jon 25 1995 2016 1 Maria 47 1973 2000 2 Bill 38 1982 2005
Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:
import numpy as np import pandas as pd my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]], dtype=object) df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year']) print(df) print(type(df)) print(df.dtypes)
Currently, all the columns under the DataFrame are objects/strings:
Name Age Birth Year Graduation Year 0 Jon 25 1995 2016 1 Maria 47 1973 2000 2 Bill 38 1982 2005 Name object Age object Birth Year object Graduation Year object dtype: object
What if you’d like to convert some of the columns in the DataFrame from objects/strings to integers?
For example, suppose that you’d like to convert the last 3 columns in the DataFrame to integers.
To achieve this goal, you can use astype(int) as captured below:
import numpy as np import pandas as pd my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]]) df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year']) df['Age'] = df['Age'].astype(int) df['Birth Year'] = df['Birth Year'].astype(int) df['Graduation Year'] = df['Graduation Year'].astype(int) print(df) print(type(df)) print(df.dtypes)
Using astype(int) will give you int32 for those 3 columns:
Name Age Birth Year Graduation Year 0 Jon 25 1995 2016 1 Maria 47 1973 2000 2 Bill 38 1982 2005 Name object Age int32 Birth Year int32 Graduation Year int32 dtype: object
Alternatively, you can use apply(int) which will get you int64 for those last 3 columns:
import numpy as np import pandas as pd my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]]) df = pd.DataFrame(my_array, columns = ['Name','Age','Birth Year','Graduation Year']) df['Age'] = df['Age'].apply(int) df['Birth Year'] = df['Birth Year'].apply(int) df['Graduation Year'] = df['Graduation Year'].apply(int) print(df) print(type(df)) print(df.dtypes)
As you can see, the last 3 columns in the DataFrame are now int64:
Name Age Birth Year Graduation Year 0 Jon 25 1995 2016 1 Maria 47 1973 2000 2 Bill 38 1982 2005 Name object Age int64 Birth Year int64 Graduation Year int64 dtype: object
You can read more about Pandas DataFrames by visiting the Pandas Documentation.