How to handle numerical data in Pandas

This article will show you how to handle numerical data in Pandas.

Loading Numerical Data

Use the pandas.read_csv() function to load numerical data from a CSV file into a DataFrame:


import pandas as pd

# Load data from a CSV file
data = pd.read_csv("data.csv")
print(data.head())

Exploring and Summarizing Data

Pandas provides methods to quickly explore and summarize numerical data:

  • data.describe(): Generates summary statistics for numerical columns.
  • data.info(): Provides an overview of the DataFrame, including column data types and non-null counts.
  • data.dtypes: Displays the data types of each column.

# Summary statistics
print(data.describe())

# Overview of the DataFrame
print(data.info())

Handling Missing Values

Missing values can disrupt analysis. Use Pandas to handle them effectively:

  • data.fillna(value): Fill missing values with a specified value.
  • data.dropna(): Remove rows or columns with missing values.

# Fill missing values with the column mean
data['column_name'] = data['column_name'].fillna(data['column_name'].mean())

# Drop rows with missing values
data = data.dropna()

Scaling and Normalizing Data

Scaling and normalizing numerical data are essential for many machine learning algorithms. Use libraries like sklearn with Pandas:


from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
data[['scaled_column']] = scaler.fit_transform(data[['column_name']])

Applying Mathematical Operations

Pandas supports element-wise operations and functions:

  • data[‘column’] + 10: Add 10 to each value in the column.
  • data[‘column’].apply(function): Apply a custom function to each value.

# Add 10 to each value in a column
data['new_column'] = data['column_name'] + 10

# Apply a custom function
data['transformed'] = data['column_name'].apply(lambda x: x ** 2)

Aggregating and Grouping Data

Aggregate numerical data using grouping functions like groupby:


# Group by a category and calculate the mean
grouped_data = data.groupby('category_column')['numerical_column'].mean()
print(grouped_data)

Visualizing Numerical Data

Use Pandas’ built-in plotting capabilities or libraries like Matplotlib and Seaborn for visualization:


import matplotlib.pyplot as plt

# Plot histogram
data['numerical_column'].hist()
plt.show()

# Box plot
data.boxplot(column='numerical_column', by='category_column')
plt.show()

Leave a Reply