Convert dataframe to array python

pandas.DataFrame.to_numpy#

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32 , the results dtype will be float32 . This may require copying data and coercing values, which may be expensive.

Parameters dtype str or numpy.dtype, optional

The dtype to pass to numpy.asarray() .

copy bool, default False

Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_value Any, optional

The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Similar method for Series.

>>> pd.DataFrame("A": [1, 2], "B": [3, 4]>).to_numpy() array([[1, 3], [2, 4]]) 

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame("A": [1, 2], "B": [3.0, 4.5]>) >>> df.to_numpy() array([[1. , 3. ], [2. , 4.5]]) 

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df['C'] = pd.date_range('2000', periods=2) >>> df.to_numpy() array([[1, 3.0, Timestamp('2000-01-01 00:00:00')], [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object) 

Источник

Преобразование Pandas DataFrame в массив NumPy (с примерами)

Вы можете использовать следующий синтаксис для преобразования кадра данных pandas в массив NumPy:

В следующих примерах показано, как использовать этот синтаксис на практике.

Пример 1: преобразование DataFrame с одинаковыми типами данных

В следующем коде показано, как преобразовать DataFrame pandas в массив NumPy, когда каждый из столбцов в DataFrame имеет один и тот же тип данных:

import pandas as pd #create data frame df1 = pd.DataFrame() #view data frame print(df1) rebounds points assists 0 7 5 11 1 7 7 8 2 8 7 10 3 13 9 6 4 7 12 6 5 4 9 5 #convert DataFrame to NumPy array new = df1.to_numpy () #view NumPy array print(new) [[ 7 5 11] [ 7 7 8] [ 8 7 10] [13 9 6] [ 7 12 6] [ 4 9 5]] #confirm that *new* is a NumPy array print(type(new)) #view data type print(new. dtype ) int64 

Массив Numpy имеет тип данных int64, поскольку каждый столбец в исходном кадре данных pandas был целым числом.

Пример 2: преобразование DataFrame со смешанными типами данных

В следующем коде показано, как преобразовать DataFrame pandas в массив NumPy, когда столбцы в DataFrame не все имеют один и тот же тип данных:

import pandas as pd #create data frame df2 = pd.DataFrame() #view data frame print(df2) player points assists 0 A 5 11 1 B 7 8 2 C 7 10 3 D 9 6 4 E 12 6 5 F 9 5 #convert DataFrame to NumPy array new = df2. to_numpy () #view NumPy array print(new) [['A' 5 11] ['B' 7 8] ['C' 7 10] ['D' 9 6] ['E' 12 6] ['F' 9 5]] #confirm that *new* is a NumPy array print(type(new)) #view data type print(new. dtype ) object 

Массив Numpy имеет тип данных объекта , поскольку не каждый столбец в исходном фрейме данных pandas имеет один и тот же тип данных.

Пример 3: преобразование DataFrame и установка значений NA

В следующем коде показано, как преобразовать DataFrame pandas в массив NumPy и указать значения, которые будут установлены для любых значений NA в исходном DataFrame:

import pandas as pd #create data frame df3 = pd.DataFrame() #view data frame print(df3) player points assists 0 A 5 11 1 B 7 8 2  10 3 D 9 6 4 E 6 5 F 9 5 #convert DataFrame to NumPy array new = df3. to_numpy (na_value='none') #view NumPy array print(new) [['A' 5 11] ['B' 7 8] ['none' 'none' 10] ['D' 9 6] ['E' 'none' 6] ['F' 9 5]] #confirm that *new* is a NumPy array print(type(new)) #view data type print(new. dtype ) object 

Источник

How to Convert Pandas Dataframe to Numpy Array – With Examples

Stack Vidhya

Numpy arrays provide fast and versatile ways to normalize data that can be used to clean and scale the data during the training of the machine learning models.

You can convert pandas dataframe to numpy array using the df.to_numpy() method.

In this tutorial, you’ll learn how to convert pandas dataframe to numpy array with examples and different conditions.

If you’re in Hurry

You can use the below code snippet to convert pandas dataframe into numpy array.

numpy_array = df.to_numpy() print(type(numpy_array))

If You Want to Understand Details, Read on…

Lets create a sample dataframe and convert it into a NumPy array.

Sample Dataframe

Create a sample dataframe that you’ll use to convert to a NumPy array. It contains two columns and four rows. Also in one cell, it contains NaN which means a missing value.

import pandas as pd import numpy as np data = df = pd.DataFrame(data, columns = ['Age','Birth Year']) df

Dataframe Will Look Like

Age Birth Year
0 15 2006.0
1 25 1996.0
2 35 1986.0
3 45 NaN

Now, you’ll use this dataframe to convert it into a numpy array.

Using to_numpy()

You can convert a pandas dataframe to a NumPy array using the method to_numpy() .

It accepts three optional parameters.

  • dtype – to specify the datatype of the values in the array
  • copy – copy=True makes a new copy of the array and copy=False returns just a view of another array. False is default and it’ll return just a view of another array, if it exists.
  • na_value – To specify a value to be used for any missing value in the array. You can pass any value here.

This is an officially recommended method to convert a pandas dataframe into a NumPy array.

When executing the snippet below, the dataframe will be converted into a NumPy array.

  • The missing value will not be replaced with any value because you are not specifying any value to a missing value.
numpy_array = df.to_numpy() print(numpy_array) print(type(numpy_array))
[[ 15. 2006.] [ 25. 1996.] [ 35. 1986.] [ 45. nan]]

This is how you can convert a pandas dataframe into a numpy array.

Using dataframe.values

In this section, you’ll convert the dataframe into a NumPy array using df.values. The values method returns the NumPy array representation of the dataframe.

Only the cell values in the dataframe will be returned as an array. row and column axes labels will be removed.

Use the following code to convert the dataframe into a number array.

values_array = df.values print(values_array) print(type(values_array))
[[ 15. 2006.] [ 25. 1996.] [ 35. 1986.] [ 45. nan]]

This is how you can convert a dataframe into an numpy array using the values attribute of the dataframe.

Convert Select Columns into Numpy Array

You can convert select columns of a dataframe into an numpy array using the to_numpy() method by passing the column subset of the dataframe.

For example, df[[‘Age’]] will return just the age column. When you invoke the to_numpy() method in the resultant dataframe, you’ll get the numpy array of the age column in the dataframe.

age_array = df[['Age']].to_numpy() print(age_array)

You’ll see the age column as an NumPy array.

This is how you can convert a select column of a pandas dataframe into a numpy array.

Handle Missing Values while converting Dataframe to Numpy Array

In this section, you’ll learn how to handle missing values while converting a pandas dataframe to a numpy array.

You can replace missing values by passing the value to be used in case of missing values using the na_value parameter.

If you use na_value = 0 , the missing values will be replaced with 0 .

In the sample dataframe, you’ve created before there is one missing value for birth year. Now, when you execute the below snippet on the sample dataframe, the missing year will be replaced with 1950.

array = df.to_numpy(na_value='1950') print(array)
[[ 15. 2006.] [ 25. 1996.] [ 35. 1986.] [ 45. 1950.]]

This is how you can replace a missing value with a value while converting a dataframe into a numpy array.

Handling Index While Converting Pandas Dataframe to Numpy Array

You may need to include or exclude the index column of the dataframe while converting it into the dataframe.

You can control this by using the method to_records().

to_records() will convert the dataframe into a numpy record array. It accepts three optional parameters.

  • Index – Flag to denote when the index column must be included in the resultant record array. By default its True and the index column will be included in the resultant array.
  • column_dtypes – Datatypes of the columns in the resultant record array.
  • index_dtypes – Datatype to be used for the index columns, if the index columns are included in the data array. This is applied only if Index = True .
[(0, 15, 2006.) (1, 25, 1996.) (2, 35, 1986.) (3, 45, nan)]

You May Also Like

Источник

Читайте также:  Java options file encoding
Оцените статью