Pandas How To Uncategorized How to calculate entropy in Pandas

How to calculate entropy in Pandas

To calculate entropy in Pandas, you can write a custom function that takes a Series of values as input and calculates the entropy using the formula:

entropy = -sum(p * log2(p) for p in probabilities)

Here, probabilities is a list of probabilities of each unique value in the Series, calculated as the count of each value divided by the total number of values.

Here is an example of how you can use this function to calculate the entropy of a column in a DataFrame:

import pandas as pd
import numpy as np

def entropy(s):
    values, counts = np.unique(s, return_counts=True)
    probabilities = counts / len(s)
    entropy = -np.sum(probabilities * np.log2(probabilities))
    return entropy

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'B': [10, 20, 30, 40, 50, 60, 70, 80, 90]})

# Calculate the entropy of column 'A'
entropy_A = entropy(df['A'])

# Print the result
print(entropy_A)

In this example, the entropy of the values in column ‘A’ of the DataFrame df is calculated and stored in the variable entropy_A. The entropy is calculated using the custom entropy function, which takes a Series as input and returns the entropy as a float.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post