Here’s how to calculate skewness in Pandas.
Skewness is a measure of the asymmetry of a distribution. A distribution is said to be skewed if the mean, median, and mode are not all equal.
In Pandas, you can calculate skewness using the skew()
method. The skew()
method takes a DataFrame or Series as its argument and returns a numeric value.
How to calculate skewness of rows
The following code shows how to calculate skewness in Pandas:
import pandas as pd my_df = pd.DataFrame( {'Column1': ['1', '4', '3', '4'], 'Column2': ['5', '6', '2', '2'], 'Column3': ['33', '10', '43', '12']}) my_skew = my_df.skew(axis=1) print(my_skew)
In the above code, my_df.skew returns the skewness of the dataframe my_df.
Output:
0 1.630059 1 0.935220 2 1.728489 3 1.457863 dtype: float64 Process finished with exit code 0
The result indicates that the dataframe has a positive skew, meaning the values are skewed to the right of the mean.
How to calculate skewness of columns
import pandas as pd my_df = pd.DataFrame( {'Column1': ['1', '4', '3', '4'], 'Column2': ['5', '6', '2', '2'], 'Column3': ['33', '10', '43', '12']}) my_skew = my_df.skew(axis=0) print(my_skew)
Output:
Column1 -1.414214 Column2 0.199735 Column3 0.308539 dtype: float64 Process finished with exit code 0
Interpreting the Results
The skewness of a distribution can be positive, negative, or zero. A positive skew indicates that the distribution is skewed to the right, meaning that the tail on the right side of the distribution is longer than the tail on the left side. A negative skew indicates that the distribution is skewed to the left, meaning that the tail on the left side of the distribution is longer than the tail on the right side. A skewness of zero indicates that the distribution is symmetric.
In the example above, the skewness of all three columns is positive. This indicates that all three distributions are skewed to the right.
Limitations of Skewness
Skewness is a useful measure of the asymmetry of a distribution, but it has some limitations. One limitation is that skewness is sensitive to outliers. An outlier is a data point that is far away from the rest of the data. If a distribution has a few outliers, the skewness of the distribution may be misleading.
Another limitation of skewness is that it is not a very robust measure. A robust measure is a measure that is not affected by outliers. Skewness is not very robust because it is sensitive to the distribution of the data.
See also:
Link to the skew function documentation.
Pingback: How To Calculate Ratio In Pandas • Pandas How To