- Pandas set_index() – Set Index to DataFrame
- 1. Quick Examples of pandas Set Index
- 2. pandas.DataFrame.set_index() Syntax
- 3. pandas Set Index Example
- 4. Setting Single Column as Index by using set_index()
- 5. pandas set Index Multiple Columns
- 6. pandas Set Index to datetime
- 7. Complete Example of pandas Set Index
- 8. Conclusion
- Related Articles
- Reference
- You may also like reading:
- Naveen (NNK)
Pandas set_index() – Set Index to DataFrame
pandas.DataFrame.set_index() is used to set the index to pandas DataFrame. By using set_index() method you can set the list of values, existing pandas DataFrame column, Series as an index, also set multiple columns as indexes. Use pandas.DataFrame.reset_index() to reset the index with default numeric values.
An index is like a pointer to identify rows/columns across the DataFrame or series. Rows and columns both have indexes. Rows indices are called indexes and for columns, it’s usually column names or labels.
pandas.DataFrame.set_index() Key Points
- Index can be set while creating a pandas DataFrame, use set_index() method to set indices to existing DataFrmae.
- You can also set index from a List, Series or DataFrame. hence, you can have mutliple indices to the DataFrame.
1. Quick Examples of pandas Set Index
Below are quick examples and usage of pandas.DataFrame.set_index() method.
# Below are the quick examples. # Set list to index index_labels=['r1','r2','r3'] df.index = index_labels # Set single colin as index df2 = df.set_index('Courses') # Append index df2 = df.set_index('Courses', append=True) # Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) # Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date'])))
2. pandas.DataFrame.set_index() Syntax
Below is the syntax of the set_index() method.
# Pandas DataFrame set_index() syntax DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
- keys – Accepts singe column name as String, list of column names e.t.c
- drop – Deletes the column after setting an index. Default set to True.
- append – Specify to append new Index to existing Index. Default set to False.
- inplace – Modifies the existing DataFrame object in place. Default set to False.
- verify_integrity – Check the new index for duplicates. Default set to False. By using True it degrades the performance of the method.
Let’s create a pandas DataFrame, run the above examples, and validate results.
# Create DataFrame import pandas as pd import numpy as np technologies = < 'Courses':["Spark","PySpark","Hadoop"], 'Fee' :[20000,25000,26000], 'Duration':['30day','40days','35days'], 'Discount':[1000,np.nan,1200], 'Start_Date' : ['2021-02-04 05:30:00','01-09-2021 06:30:00', '2021-03-06 07:30:00'] >df = pd.DataFrame(technologies) print(df) # Output: # Courses Fee Duration Discount Start_Date # 0 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # 1 PySpark 25000 40days NaN 01-09-2021 06:30:00 # 2 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
3. pandas Set Index Example
Since we have not provided an index list at the time of creating the above DataFrame, pandas DataFrame by default assigns incremental sequence numbers as labels to rows as Index. You can change the index by assigning the list of values to DataFrame.index variable.
# Set list to index index_labels=['r1','r2','r3'] df.index = index_labels print(df) # Outputs: # Courses Fee Duration Discount Start_Date # r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # r2 PySpark 25000 40days NaN 01-09-2021 06:30:00 # r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
4. Setting Single Column as Index by using set_index()
Sometimes you would be required to set one of the existing DataFrame column as an Index, you can achieve this by using set_index() method. after setting the index, it drops the column from DataFrame. To retain it use the drop=False param.
# Set single colin as index df2 = df.set_index('Courses') print(df2) # Output: # Fee Duration Discount Start_Date # Courses # Spark 20000 30day 1000.0 2021-02-04 05:30:00 # PySpark 25000 40days NaN 01-09-2021 06:30:00 # Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
Note that setting the index replaces the existing index in DataFrame. If you wanted to retain the existing Index and append new index use append=True .
# Append index df2 = df.set_index('Courses', append=True) print(df2) # Output: # Fee Duration Discount Start_Date # Courses # r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # r2 PySpark 25000 40days NaN 01-09-2021 06:30:00 # r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
5. pandas set Index Multiple Columns
You can also set multiple columns as index in pandas, In order to do so just pass all columns in a list to DataFrame.set_index() method.
# Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) print(df2) # Output: # Fee Discount Start_Date # Courses Duration # Spark 30day 20000 1000.0 2021-02-04 05:30:00 # PySpark 40days 25000 NaN 01-09-2021 06:30:00 # Hadoop 35days 26000 1200.0 2021-03-06 07:30:00
6. pandas Set Index to datetime
When you are working with date and time and wanted to perform some filtering on datetime, it’s best practice to set the date and time column as an index. Before you do this, make sure your date column is in datetime format. Use pandas.DatetimeIndex() method to conver datetime to index.
# Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date']))) print(df2) # Output: # Courses Fee Duration Discount Start_Date # Start_Date # 2021-02-04 05:30:00 Spark 20000 30day 1000.0 2021-02-04 05:30:00 # 2021-01-09 06:30:00 PySpark 25000 40days NaN 01-09-2021 06:30:00 # 2021-03-06 07:30:00 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
By run df2.inf(), will result you below
DatetimeIndex: 3 entries, 2021-02-04 05:30:00 to 2021-03-06 07:30:00 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Courses 3 non-null object 1 Fee 3 non-null int64 2 Duration 3 non-null object 3 Discount 2 non-null float64 4 Start_Date 3 non-null object dtypes: float64(1), int64(1), object(3) memory usage: 144.0+ bytes None
7. Complete Example of pandas Set Index
import pandas as pd import numpy as np technologies = < 'Courses':["Spark","PySpark","Hadoop"], 'Fee' :[20000,25000,26000], 'Duration':['30day','40days','35days'], 'Discount':[1000,np.nan,1200], 'Start_Date' : ['2021-02-04 05:01:21','01-09-2021 06:03:41', '2021-03-06 07:06:21'] >df = pd.DataFrame(technologies) print(df) # Set list to index index_labels=['r1','r2','r3'] df.index = index_labels print(df) # Set single colin as index df2 = df.set_index('Courses') print(df2) # Append index df2 = df.set_index('Courses', append=True) print(df2) # Set multiple columns as Index df2 = df.set_index(['Courses','Duration']) print(df2) # Set date time as index df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date']))) print(df2) print(df2.info())
8. Conclusion
In this article, you have learned pandas.DataFrame.set_index() syntax, usage, and examples like setting list, DataFrame column as an index. And also learned to set multiple columns and DateTime as indexes to DataFrame.
Related Articles
Reference
You may also like reading:
Naveen (NNK)
SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..