From this Pandas article you can learn how to drop level of multiindex in Pandas.
MultiIndex allow multiple levels of row or column labels. While MultiIndexes offer flexibility, sometimes simplifying data by dropping a level is necessary. This is a manual on effectively dropping levels from a Pandas MultiIndex.
Understanding MultiIndexes
Before dropping levels, let’s recap MultiIndexes. They are essential when dealing with hierarchical data, representing data with multiple dimensions. Imagine tracking sales data by region and product category. A MultiIndex could represent this, with the first level being the region and the second level the product category.
Creating a Sample MultiIndex
Let’s create a sample MultiIndex:
import pandas as pd
data = {'Sales': {('North', 'Electronics'): 10, ('North', 'Clothing'): 15, ('South', 'Electronics'): 20, ('South', 'Clothing'): 25, ('East', 'Electronics'): 30, ('East', 'Clothing'): 35}, 'Profit': {('North', 'Electronics'): 2, ('North', 'Clothing'): 3, ('South', 'Electronics'): 4, ('South', 'Clothing'): 5, ('East', 'Electronics'): 6, ('East', 'Clothing'): 7}} index = pd.MultiIndex.from_tuples([('North', 'Electronics'), ('North', 'Clothing'), ('South', 'Electronics'), ('South', 'Clothing'), ('East', 'Electronics'), ('East', 'Clothing')], names=['Region', 'Category']) df = pd.DataFrame(data, index=index) print(df)
This code creates a DataFrame df with a MultiIndex representing Region and Category.
Dropping Levels: The droplevel() Method
Pandas provides the droplevel() method for removing levels from a MultiIndex. Here’s the basic syntax:
df.droplevel(level=None, axis=0)
- level: Specifies the level to drop. This can be an integer (representing the level’s position, starting from 0) or the name of the level.
- axis: Specifies whether to drop a level from the rows (0 or ‘index’) or columns (1 or ‘columns’). Defaults to 0 (rows).
Examples of Dropping Levels
Dropping by Level Name:
To drop the ‘Region’ level:
df_region_dropped = df.droplevel(level='Region') print(df_region_dropped)
Dropping by Level Position
To drop the first level (Region, which is at position 0):
df_level0_dropped = df.droplevel(level=0) print(df_level0_dropped)
Dropping Levels from Columns
If your DataFrame has a MultiIndex for columns, you can drop levels from them as well:
# Assuming df_columns has a MultiIndex for columns df_columns_dropped = df_columns.droplevel(level=1, axis=1) # Drops the second level of columns print(df_columns_dropped)
Dropping a level changes the structure of your data. Understand the implications before proceeding. You might lose information if not handled carefully. If the remaining levels do not uniquely identify the rows after dropping a level, you might encounter issues. Pandas will need to decide how to aggregate or handle duplicate index values.
The droplevel() method returns a new DataFrame by default. To modify the original DataFrame directly, use inplace=True:
df.droplevel(level='Region', inplace=True) print(df)