Python create dataframe name

Create Pandas DataFrame With Examples

One simplest way to create a pandas DataFrame is by using its constructor. Besides this, there are many other ways to create a DataFrame in pandas. For example, creating DataFrame from a list, created by reading a CSV file, creating it from a Series, creating empty DataFrame, and many more.

Python pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named Numpy, which provides scientific computing in Python. pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it as an excel spreadsheet or SQL table.

1. Create pandas DataFrame

One of the easiest ways to create a pandas DataFrame is by using its constructor. DataFrame constructor takes several optional params that are used to specify the characteristics of the DataFrame.

Читайте также:  Красивая html css верстка

Below is the syntax of the DataFrame constructor.

 # DataFrame constructor syntax pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None) 

Now, let’s create a DataFrame from a list of lists (with a few rows and columns).

 # Create pandas DataFrame from List import pandas as pd technologies = [ ["Spark",20000, "30days"], ["Pandas",25000, "40days"], ] df=pd.DataFrame(technologies) print(df) 

Since we have not given index and column labels, DataFrame by default assigns incremental sequence numbers as labels to both rows and columns.

 # Output: 0 1 2 0 Spark 20000 30days 1 Pandas 25000 40days 

Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & custom index respectively to the DataFrame.

 # Add Column & Row Labels to the DataFrame column_names=["Courses","Fee","Duration"] row_label=["a","b"] df=pd.DataFrame(technologies,columns=column_names,index=row_label) print(df) 

Yields below output. Alternatively, you can also add columns labels to the existing DataFrame.

 # Output: Courses Fee Duration a Spark 20000 30days b Pandas 25000 40days 

By default, pandas identify the data types from the data and assign’s to the DataFrame. df.dtypes returns the data type of each column.

 # Output: Courses object Fee int64 Duration object dtype: object 

You can also assign custom data types to columns.

 # Set custom types to DataFrame types= df=df.astype(types) 

2. Create DataFrame from the Dic (dictionary).

Another most used way to create pandas DataFrame is from the python Dict (dictionary) object. This comes in handy if you wanted to convert the dictionary object into DataFrame. Key from the Dict object becomes column and value convert into rows.

 # Create DataFrame from Dict technologies = < 'Courses':["Spark","Pandas"], 'Fee' :[20000,25000], 'Duration':['30days','40days'] >df = pd.DataFrame(technologies) print(df) 

3. Create DataFrame with Index

By default, DataFrame add’s a numeric index starting from zero. It can be changed with a custom index while creating a DataFrame.

 # Create DataFrame with Index. technologies = < 'Courses':["Spark","Pandas"], 'Fee' :[20000,25000], 'Duration':['30days','40days'] >index_label=["r1","r2"] df = pd.DataFrame(technologies, index=index_label) print(df) 

4. Creating Dataframe from list of dicts object

Sometimes we get data in JSON string (similar dict), you can convert it to DataFrame as shown below.

 # Creates DataFrame from list of dict technologies = [, ] df = pd.DataFrame(technologies) print(df) 

5. Creating DataFrame From Series

By using concat() method you can create Dataframe from multiple Series. This takes several params, for the scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.

 # Create pandas Series courses = pd.Series(["Spark","Pandas"]) fees = pd.Series([20000,25000]) duration = pd.Series(['30days','40days']) # Create DataFrame from series objects. df=pd.concat([courses,fees,duration],axis=1) print(df) # Outputs # 0 1 2 # 0 Spark 20000 30days # 1 Pandas 25000 40days 

6. Add Column Labels

As you see above, by default concat() method doesn’t add column labels. You can do so as below.

 # Assign Index to Series index_labels=['r1','r2'] courses.index = index_labels fees.index = index_labels duration.index = index_labels # Concat Series by Changing Names df=pd.concat(,axis=1) print(df) # Outputs: # Courses Course_Fee Course_Duration # r1 Spark 20000 30days # r2 Pandas 25000 40days 

7. Creating DataFrame using zip() function

Multiple lists can be merged using zip() method and the output is used to create a DataFrame.

 # Create Lists Courses = ['Spark', 'Pandas'] Fee = [20000,25000] Duration = ['30days','40days'] # Merge lists by using zip(). tuples_list = list(zip(Courses, Fee, Duration)) df = pd.DataFrame(tuples_list, columns = ['Courses', 'Fee', 'Duration']) 

8. Create an empty DataFrame in pandas

Sometimes you would need to create an empty pandas DataFrame with or without columns. This would be required in many cases, below is one example.

While working with files, sometimes we may not receive a file for processing, however, we still need to create a DataFrame manually with the same column names we expect. If we don’t create with the same columns, our operations/transformations (like union’s) on DataFrame fail as we refer to the columns that may not be present.

To handle situations similar to these, we always need to create a DataFrame with the expected columns, which means the same column names and datatypes regardless of the file exists or empty file processing.

 # Create Empty DataFrame df = pd.DataFrame() print(df) # Outputs: # Empty DataFrame # Columns: [] # Index: [] 

To create an empty DataFrame with just column names but no data.

 # Create Empty DataFraem with Column Labels df = pd.DataFrame(columns = ["Courses","Fee","Duration"]) print(df) # Outputs: # Empty DataFrame # Columns: [Courses, Fee, Duration] # Index: [] 

9. Create DataFrame From CSV File

In real-time we are often required to read the contents of CSV files and create a DataFrame. In pandas, creating a DataFrame from CSV is done by using pandas.read_csv() method. This returns a DataFrame with the contents of a CSV file.

 # Create DataFrame from CSV file df = pd.read_csv('data_file.csv') 

10. Create From Another DataFrame

Finally, you can also copy a DataFrame from another DataFrame using copy() method.

 # Copy DataFrame to another df2=df.copy() print(df2) 

Conclusion

In this article, you have learned different ways to create a pandas DataFrame with examples. It can be created from a constructor, list, dictionary, series, CSV file, and many more.

References

You may also like reading:

Источник

Как создать DataFrames Pandas в Python – 7 методов

Фрейм данных – это двухмерный набор данных, структура, в которой данные хранятся в табличной форме. Наборы данных упорядочены по строкам и столбцам; мы можем хранить несколько наборов данных во фрейме данных. Мы можем выполнять различные арифметические операции, такие как добавление выбора столбца или строки и столбцов или строк во фрейме данных.

Мы можем импортировать DataFrames из внешнего хранилища; эти хранилища могут быть базой данных SQL, файлом CSV или файлом Excel. Мы также можем использовать списки, словарь, список словаря и т. д.

В этом руководстве мы научимся создавать фрейм данных несколькими способами. Давайте разберемся как создать DataFrames Pandas в Python. Во-первых, нам нужно установить библиотеку pandas в среду Python.

Пустой фрейм данных

Мы можем создать базовый пустой фрейм данных. Для создания DataFrame необходимо вызвать конструктор фрейма данных.

# import pandas as pd import pandas as pd # Calling DataFrame constructor df = pd.DataFrame() print(df)
Empty DataFrame Columns: [] Index: []

Метод – 2: создать фрейм данных с помощью списка

Мы можем создать фрейм данных, используя простой список или список списков. Давайте разберемся в следующем примере.

# importing pandas library import pandas as pd # string values in the list lst = ['Java', 'Python', 'C', 'C++', 'JavaScript', 'Swift', 'Go'] # Calling DataFrame constructor on list dframe = pd.DataFrame(lst) print(dframe)
0 Java 1 Python 2 C 3 C++ 4 JavaScript 5 Swift 6 Go

Метод – 3: Dataframe из dict ndarray / lists

Dict ndarray / lists можно использовать для создания фрейма данных, все ndarray должны иметь одинаковую длину. По умолчанию индекс будет диапазоном(n); где n обозначает длину массива.

import pandas as pd # assign data of lists. data = # Create DataFrame df = pd.DataFrame(data) # Print the output. print(df)
Name Age 0 Tom 20 1 Joseph 21 2 Krish 19 3 John 18

Метод – 4: Создание фрейма данных индексов с использованием массивов

Давайте разберемся в примере создания фрейм данных индексов с использованием массивов.

# DataFrame using arrays. import pandas as pd # assign data of lists. data = <'Name':['Renault', 'Duster', 'Maruti', 'Honda City'], 'Ratings':[9.0, 8.0, 5.0, 3.0]># Creates pandas DataFrame. df = pd.DataFrame(data, index =['position1', 'position2', 'position3', 'position4']) # print the data print(df)
Name Ratings position1 Renault 9.0 position2 Duster 8.0 position3 Maruti 5.0 position4 Honda City 3.0

В приведенном выше коде мы определили имя столбца с различными названиями автомобилей и их рейтингами. Мы использовали массив для создания индексов.

Метод – 5: Dataframe из списка dicts

Мы можем передать списки словарей в качестве входных данных для создания фрейма данных Pandas. Имена столбцов по умолчанию используются в качестве ключей.

# the example is to create # Pandas DataFrame by lists of dicts. import pandas as pd # assign values to lists. data = [, ] # Creates DataFrame. df = pd.DataFrame(data) # Print the data print(df)
A B C x y z 0 10.0 20.0 30.0 NaN NaN NaN 1 NaN NaN NaN 100.0 200.0 300.0

Давайте разберемся в другом примере создания фрейма данных pandas из списка словарей с индексом строки и индексом столбца.

import pandas as pd # assigns values to lists. data = [, ] # With two column indices, values same # as dictionary keys dframe1 = pd.DataFrame(data, index =['first', 'second'], columns =['x', 'y']) # With two column indices with # one index with other name dframe2 = pd.DataFrame(data, index =['first', 'second'], columns =['x', 'y1']) # print the first data frame print(dframe1, "\n") # Print the second DataFrame. print(dframe2)
x y first 1.0 2.0 second NaN NaN x y1 first 1.0 NaN second NaN NaN

Рассмотрим пример создания фрейма данных путем передачи списков словарей и строк.

# The example is to create # Pandas DataFrame by passing lists of # Dictionaries and row indices. import pandas as pd # assign values to lists data = [, ] # Creates padas DataFrame by passing # Lists of dictionaries and row index. dframe = pd.DataFrame(data, index =['first', 'second']) # Print the dataframe print(dframe)
x y z first 2 NaN 3 second 10 20.0 30

Мы обсудили три способа создания фрейма данных с использованием списков словаря.

Метод – 6: с помощью функции zip()

Функция zip() используется для объединения двух списков. Давайте разберемся в следующем примере.

# The example is to create # pandas dataframe from lists using zip. import pandas as pd # List1 Name = ['tom', 'krish', 'arun', 'juli'] # List2 Marks = [95, 63, 54, 47] # two lists. # and merge them by using zip(). list_tuples = list(zip(Name, Marks)) # Assign data to tuples. print(list_tuples) # Converting lists of tuples into # pandas Dataframe. dframe = pd.DataFrame(list_tuples, columns=['Name', 'Marks']) # Print data. print(dframe)
[('john', 95),('krish', 63),('arun', 54),('juli', 47)] Name Marks 0 john 95 1 krish 63 2 arun 54 3 juli 47

Метод – 7: из Dicts серии

Словарь можно передать для создания фрейма данных. Мы можем использовать Dicts of series, где последующий индекс представляет собой объединение всех серий переданного значения индекса. Давайте разберем на примере.

# Pandas Dataframe from Dicts of series. import pandas as pd # Initialize data to Dicts of series. d = # creates Dataframe. dframe = pd.DataFrame(d) # print the data. print(dframe)
Electronics Civil John 97 97 Abhinay 56 88 Peter 87 44 Andrew 45 96

В этом руководстве мы обсудили различные способы создания DataFrames.

Источник

Оцените статью