Pandas is a popular Python library for data analysis and manipulation. One of the common tasks that you may encounter when working with Pandas is dealing with missing values, also known as nan values. Nan stands for not a number, and it indicates that the value is undefined or invalid. Nan values can arise from various sources, such as reading data from a file, performing calculations, or applying transformations.
Nan values can cause problems for some operations, such as sorting, aggregating, or plotting. Therefore, you may want to remove them from your data frame or series. There are two main ways to do this: using the dropna() method or using the fillna() method.
The dropna() method removes any rows or columns that contain nan values from your data frame or series. You can specify how to handle the missing values by using the following parameters:
- axis: 0 for rows, 1 for columns
- how: ‘any’ for dropping rows or columns that have any nan values, ‘all’ for dropping rows or columns that have all nan values
- thresh: a minimum number of non-nan values required to keep a row or column
- subset: a list of columns or rows to consider for dropping
- inplace: True for modifying the original data frame or series, False for returning a new one
For example, if you have a data frame called df with four columns A, B, C, and D, and you want to drop any rows that have nan values in column B, you can use:
df.dropna(axis=0, how=’any’, subset=[‘B’], inplace=True)
The fillna() method replaces any nan values in your data frame or series with a specified value. You can use the following parameters:
- value: a scalar, a dictionary, a series, or a data frame to fill the nan values with
- method: ‘ffill’ for forward filling, ‘bfill’ for backward filling, ‘pad’ for padding, or ‘interpolate’ for interpolating
- axis: 0 for rows, 1 for columns
- limit: a maximum number of consecutive nan values to fill
- inplace: True for modifying the original data frame or series, False for returning a new one
For example, if you have a data frame called df with four columns A, B, C, and D, and you want to fill any nan values in column C with the mean of column C, you can use:
df[‘C’].fillna(value=df[‘C’].mean(), inplace=True)
Alternatively, if you want to fill any nan values in column D with the previous valid value in the same column, you can use:
df[‘D’].fillna(method=’ffill’, inplace=True)
These are some of the ways to remove nan values in Pandas. You can find more information and examples in the official documentation
Pingback: How To Remove Rows With Certain Values • Pandas How To