How to calculate the IQR in Pandas

To calculate the IQR (Interquartile Range) in Pandas, you can use the quantile() function to compute the 25th percentile (Q1) and the 75th percentile (Q3) and then subtract Q1 from Q3.

What is the IQR?

    • The IQR (Interquartile Range) is a measure of variability. It is calculated by subtracting the 25th percentile (Q1) from the 75th percentile (Q3).
    • The IQR is a robust measure of variability, meaning that it is not sensitive to outliers.

IQR in Pandas

Here’s an example code:

import pandas as pd

data = {'score': [3, 8, 12, 16, 20, 25, 27, 30, 35, 40, 45]}
df = pd.DataFrame(data)

Q1 = df['score'].quantile(0.25)
Q3 = df['score'].quantile(0.75)
IQR = Q3 - Q1

print("Q1:", Q1)
print("Q3:", Q3)
print("IQR:", IQR)

Output:

Q1: 13.0
Q3: 33.75
IQR: 20.75

In this example, we created a DataFrame with a column named ‘score’ and calculated the IQR of the values in that column. The quantile() function is used to compute the 25th and 75th percentiles, and then the IQR is calculated as the difference between them.

IQR and boxplots

The IQR is also used to create boxplots. A boxplot is a graphical representation of the distribution of a dataset. The boxplot shows the median, quartiles, and outliers of the dataset.

To create a boxplot in Pandas, we can use the pandas.DataFrame.plot() method with the kind=’box’ argument. For example:

import pandas as pd
import matplotlib.pyplot as plt

data = {'score': [3, 8, 12, 16, 20, 25, 27, 30, 35, 40, 45]}
df = pd.DataFrame(data)

df.plot(kind='box')

plt.show()

The boxplot shows that the median score is 25, and the IQR is 20.75. The two outliers, 3 and 45, are also shown on the boxplot.

Leave a Reply