Here’s the tutorial on how to set index in Python Pandas library.
Pandas offers a dedicated set_index function that allows you to create indexes in a dataframe.
import pandas as pd my_df = pd.DataFrame({'Id': ['1001', '1002', '1003', '1007'], 'Column1': [2, 5, None, 46], 'Column2': [12, None, 5, 22]}) print(f'My dataframe before I set the index: \n{my_df}')
This is my example dataframe to which I will add an index on the Id column.
How to set index in Pandas
In order to add an index to the dataframe, I use the set_index function.
import pandas as pd my_df = pd.DataFrame({'Id': ['1001', '1002', '1003', '1007'], 'Column1': [2, 5, None, 46], 'Column2': [12, None, 5, 22]}) print(f'My dataframe before I set the index: \n{my_df}') my_df.set_index(['Id'], drop=True, inplace=True) print(f'My dataframe with the index: \n {my_df}') # print(my_df.index)
As you can see, I have set the index in the “Id” column. Additionally, I dropped the Id column because otherwise it would be double. The Inplace parameter saves the change in the dataframe.
How to verify integrity
In addition, the set_index function allows us to keep an eye on the uniqueness of the index. Setting this parameter to True will prevent Pandas from adding a duplicate index to the dataframe.
my_df.set_index(['Id'], drop=True, inplace=True, verify_integrity=True)
Note: Before setting the verify_integrity parameter, make sure that the values in the column are unique. Otherwise, the error “ValueError: Index has duplicate keys: Index” will appear.
See also:
Documentation of set_index method
Pingback: How To Set Multiindex • Pandas How To
Pingback: How To Reset Index • Pandas How To