One of the most useful features of Pandas is the ability to pivot tables, which allows you to transform data by grouping, aggregating, and reshaping it.
First, you need to have a dataset in a Pandas DataFrame. Let’s assume we have a dataset that contains information about sales made by a company in different regions:
import pandas as pd data = { 'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West', 'West'], 'Product': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], 'Sales': [100, 200, 150, 50, 75, 125, 225, 175] } df = pd.DataFrame(data)
This DataFrame has three columns: Region, Product, and Sales. Region and Product are categorical variables, while Sales is a numerical variable.
To pivot this DataFrame, you can use the pivot_table() function. The pivot_table() function takes several arguments:
data: The DataFrame to pivot. index: The column(s) to use as the index (i.e., the rows). columns: The column(s) to use as the columns. values: The column(s) to use as the values (i.e., the cells). aggfunc: The aggregation function to use when multiple values are found for a cell.
Here’s an example of how to use the pivot_table() function to pivot the sales data:
table = pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc='sum')
In this example, we are pivoting the DataFrame df by using the Region column as the index and the Product column as the columns. We are using the Sales column as the values and the sum function as the aggregation function.
The resulting table DataFrame looks like this:
Product A B Region East 75 125 North 100 200 South 150 50 West 225 175
This pivot table shows the total sales made by the company in each region and for each product.
You can also use the pivot_table() function to perform more complex transformations. For example, you can group by multiple columns and calculate multiple aggregation functions:
table = pd.pivot_table(df, values='Sales', index=['Region', 'Product'], columns='Year', aggfunc={'Sales': ['sum', 'count']})
In this example, we are pivoting the DataFrame df by using the Region and Product columns as the index and the Year column as the columns. We are using the Sales column as the values and the sum and count functions as the aggregation functions.
The resulting table DataFrame looks like this:
Sales sum count Year 2019 2019 Region Product East A 75 1 B 125 1 North A 100 1 B 200 1 South A 150 1 B 50 1 West A 225 1 B 175 1