How to Calculate Standard Deviation in Pandas

Standard deviation is a measure of how spread out the values in a set are. A low standard deviation indicates that the values are close to the mean, while a high standard deviation indicates that the values are spread out over a wider range.

Here’s how to calculate standard deviation in Pandas.
how to calculate standard deviation in Pandas

How to calculate standard deviation in Pandas

In Pandas, you can calculate the standard deviation of a column using the std() method. The std() method takes a single argument, which is the name of the column.

For example, the following code calculates the standard deviation of the my_column1 column in the my_df DataFrame:

import pandas as pd

my_df = pd.DataFrame({"my_column1": [9, 2, 3, 5],
                     "my_column2": [3, 7, 6, 4],
                      "my_column3": [4, 8, 8, 8]})

print(f'The standard deviation of columns:\n{my_df.std()}')

This code outputs the following:

The standard deviation of columns:
my_column1    3.095696
my_column2    1.825742
my_column3    2.000000
dtype: float64

By default, the std() method calculates the sample standard deviation. This means that the standard deviation is calculated using the sum of squared deviations from the mean, divided by the number of values minus 1.

If you want to calculate the population standard deviation, you can set the ddof parameter to 0. The ddof parameter stands for “degrees of freedom”, and it is used to adjust the standard deviation calculation to account for the fact that we are estimating the population standard deviation from a sample.

For example, the following code calculates the population standard deviation of the my_column1 column in the my_df DataFrame:

standard_deviation = my_df['column_name'].std(ddof=0)

The std() method can also be used to calculate the standard deviation of multiple columns. To do this, you can pass a list of column names to the std() method.

For example, the following code calculates the standard deviation of the my_column1 and my_column2 columns in the my_df DataFrame:

statistics = my_df['column_name'].describe()
standard_deviation = statistics['std']

The std() method is a versatile tool that can be used to calculate the standard deviation of one or multiple columns in a Pandas DataFrame.

Using the describe() Method

In addition to the std() method, you can also use the describe() method to calculate the standard deviation of a column. The describe() method returns a DataFrame that contains summary statistics for the values in the column, including the mean, standard deviation, minimum, maximum, and quartiles.

For example, the following code calculates the standard deviation of the my_column1 column in the my_df DataFrame using the describe() method:

statistics = my_df["my_column1"].describe()

standard_deviation = statistics["std"]

The describe() method is a more versatile tool than the std() method, as it can be used to calculate summary statistics for multiple columns. However, the std() method is more efficient, as it only calculates the standard deviation of a single column.

For more details see the documentation of std function.

This Post Has 3 Comments

Leave a Reply