How to calculate entropy in Pandas

To calculate entropy in Pandas, you can write a custom function that takes a Series of values as input and calculates the entropy using the formula:

entropy = -sum(p * log2(p) for p in probabilities)

Here, probabilities is a list of probabilities of each unique value in the Series, calculated as the count of each value divided by the total number of values.

 

How to calculate the entropy of a column

Here is an example of how you can use this function to calculate the entropy of a column in a DataFrame:

import pandas as pd
import numpy as np

def entropy(s):
    values, counts = np.unique(s, return_counts=True)
    probabilities = counts / len(s)
    entropy = -np.sum(probabilities * np.log2(probabilities))
    return entropy

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'B': [10, 20, 30, 40, 50, 60, 70, 80, 90]})

# Calculate the entropy of column 'A'
entropy_A = entropy(df['A'])

# Print the result
print(entropy_A)

how to calculate entropy in pandas

In this example, the entropy of the values in column ‘A’ of the DataFrame df is calculated and stored in the variable entropy_A. The entropy is calculated using the custom entropy function, which takes a Series as input and returns the entropy as a float.

This Post Has One Comment

Leave a Reply