How to calculate entropy in Pandas

To calculate entropy in Pandas, you can write a custom function that takes a Series of values as input and calculates the entropy using the formula:

entropy = -sum(p * log2(p) for p in probabilities)

Here, probabilities is a list of probabilities of each unique value in the Series, calculated as the count of each value divided by the total number of values.


How to calculate the entropy of a column

Here is an example of how you can use this function to calculate the entropy of a column in a DataFrame:

import pandas as pd
import numpy as np

def entropy(s):
    values, counts = np.unique(s, return_counts=True)
    probabilities = counts / len(s)
    entropy = -np.sum(probabilities * np.log2(probabilities))
    return entropy

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'B': [10, 20, 30, 40, 50, 60, 70, 80, 90]})

# Calculate the entropy of column 'A'
entropy_A = entropy(df['A'])

# Print the result

how to calculate entropy in pandas

In this example, the entropy of the values in column ‘A’ of the DataFrame df is calculated and stored in the variable entropy_A. The entropy is calculated using the custom entropy function, which takes a Series as input and returns the entropy as a float.

This Post Has One Comment

Leave a Reply