Python set object index

Pandas set_index() – Set Index to DataFrame

pandas.DataFrame.set_index() is used to set the index to pandas DataFrame. By using set_index() method you can set the list of values, existing pandas DataFrame column, Series as an index, also set multiple columns as indexes. Use pandas.DataFrame.reset_index() to reset the index with default numeric values.

An index is like a pointer to identify rows/columns across the DataFrame or series. Rows and columns both have indexes. Rows indices are called indexes and for columns, it’s usually column names or labels.

pandas.DataFrame.set_index() Key Points

  • Index can be set while creating a pandas DataFrame, use set_index() method to set indices to existing DataFrmae.
  • You can also set index from a List, Series or DataFrame. hence, you can have mutliple indices to the DataFrame.

1. Quick Examples of pandas Set Index

Below are quick examples and usage of pandas.DataFrame.set_index() method.

 # Below are the quick examples. # Set list to index index_labels=['r1','r2','r3'] df.index = index_labels # Set single colin as index df2 = df.set_index('Courses') # Append index df2 = df.set_index('Courses', append=True) # Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) # Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date']))) 

2. pandas.DataFrame.set_index() Syntax

Below is the syntax of the set_index() method.

 # Pandas DataFrame set_index() syntax DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False) 
  • keys – Accepts singe column name as String, list of column names e.t.c
  • drop – Deletes the column after setting an index. Default set to True.
  • append – Specify to append new Index to existing Index. Default set to False.
  • inplace – Modifies the existing DataFrame object in place. Default set to False.
  • verify_integrity – Check the new index for duplicates. Default set to False. By using True it degrades the performance of the method.
Читайте также:  Local scope in php

Let’s create a pandas DataFrame, run the above examples, and validate results.

 # Create DataFrame import pandas as pd import numpy as np technologies = < 'Courses':["Spark","PySpark","Hadoop"], 'Fee' :[20000,25000,26000], 'Duration':['30day','40days','35days'], 'Discount':[1000,np.nan,1200], 'Start_Date' : ['2021-02-04 05:30:00','01-09-2021 06:30:00', '2021-03-06 07:30:00'] >df = pd.DataFrame(technologies) print(df) # Output: # Courses Fee Duration Discount Start_Date # 0 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # 1 PySpark 25000 40days NaN 01-09-2021 06:30:00 # 2 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00 

3. pandas Set Index Example

Since we have not provided an index list at the time of creating the above DataFrame, pandas DataFrame by default assigns incremental sequence numbers as labels to rows as Index. You can change the index by assigning the list of values to DataFrame.index variable.

 # Set list to index index_labels=['r1','r2','r3'] df.index = index_labels print(df) # Outputs: # Courses Fee Duration Discount Start_Date # r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # r2 PySpark 25000 40days NaN 01-09-2021 06:30:00 # r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00 

4. Setting Single Column as Index by using set_index()

Sometimes you would be required to set one of the existing DataFrame column as an Index, you can achieve this by using set_index() method. after setting the index, it drops the column from DataFrame. To retain it use the drop=False param.

 # Set single colin as index df2 = df.set_index('Courses') print(df2) # Output: # Fee Duration Discount Start_Date # Courses # Spark 20000 30day 1000.0 2021-02-04 05:30:00 # PySpark 25000 40days NaN 01-09-2021 06:30:00 # Hadoop 26000 35days 1200.0 2021-03-06 07:30:00 

Note that setting the index replaces the existing index in DataFrame. If you wanted to retain the existing Index and append new index use append=True .

 # Append index df2 = df.set_index('Courses', append=True) print(df2) # Output: # Fee Duration Discount Start_Date # Courses # r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # r2 PySpark 25000 40days NaN 01-09-2021 06:30:00 # r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00 

5. pandas set Index Multiple Columns

You can also set multiple columns as index in pandas, In order to do so just pass all columns in a list to DataFrame.set_index() method.

 # Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) print(df2) # Output: # Fee Discount Start_Date # Courses Duration # Spark 30day 20000 1000.0 2021-02-04 05:30:00 # PySpark 40days 25000 NaN 01-09-2021 06:30:00 # Hadoop 35days 26000 1200.0 2021-03-06 07:30:00 

6. pandas Set Index to datetime

When you are working with date and time and wanted to perform some filtering on datetime, it’s best practice to set the date and time column as an index. Before you do this, make sure your date column is in datetime format. Use pandas.DatetimeIndex() method to conver datetime to index.

 # Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date']))) print(df2) # Output: # Courses Fee Duration Discount Start_Date # Start_Date # 2021-02-04 05:30:00 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # 2021-01-09 06:30:00 PySpark 25000 40days NaN 01-09-2021 06:30:00 # 2021-03-06 07:30:00 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00 

By run df2.inf(), will result you below

  DatetimeIndex: 3 entries, 2021-02-04 05:30:00 to 2021-03-06 07:30:00 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Courses 3 non-null object 1 Fee 3 non-null int64 2 Duration 3 non-null object 3 Discount 2 non-null float64 4 Start_Date 3 non-null object dtypes: float64(1), int64(1), object(3) memory usage: 144.0+ bytes None 

7. Complete Example of pandas Set Index

 import pandas as pd import numpy as np technologies = < 'Courses':["Spark","PySpark","Hadoop"], 'Fee' :[20000,25000,26000], 'Duration':['30day','40days','35days'], 'Discount':[1000,np.nan,1200], 'Start_Date' : ['2021-02-04 05:01:21','01-09-2021 06:03:41', '2021-03-06 07:06:21'] >df = pd.DataFrame(technologies) print(df) # Set list to index index_labels=['r1','r2','r3'] df.index = index_labels print(df) # Set single colin as index df2 = df.set_index('Courses') print(df2) # Append index df2 = df.set_index('Courses', append=True) print(df2) # Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) print(df2) # Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date']))) print(df2) print(df2.info()) 

8. Conclusion

In this article, you have learned pandas.DataFrame.set_index() syntax, usage, and examples like setting list, DataFrame column as an index. And also learned to set multiple columns and DateTime as indexes to DataFrame.

Reference

You may also like reading:

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Источник

Оцените статью