This article will show you how to handle numerical data in Pandas.
Loading Numerical Data
Use the pandas.read_csv() function to load numerical data from a CSV file into a DataFrame:
import pandas as pd
# Load data from a CSV file
data = pd.read_csv("data.csv")
print(data.head())
Exploring and Summarizing Data
Pandas provides methods to quickly explore and summarize numerical data:
- data.describe(): Generates summary statistics for numerical columns.
- data.info(): Provides an overview of the DataFrame, including column data types and non-null counts.
- data.dtypes: Displays the data types of each column.
# Summary statistics
print(data.describe())
# Overview of the DataFrame
print(data.info())
Handling Missing Values
Missing values can disrupt analysis. Use Pandas to handle them effectively:
- data.fillna(value): Fill missing values with a specified value.
- data.dropna(): Remove rows or columns with missing values.
# Fill missing values with the column mean
data['column_name'] = data['column_name'].fillna(data['column_name'].mean())
# Drop rows with missing values
data = data.dropna()
Scaling and Normalizing Data
Scaling and normalizing numerical data are essential for many machine learning algorithms. Use libraries like sklearn
with Pandas:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data[['scaled_column']] = scaler.fit_transform(data[['column_name']])
Applying Mathematical Operations
Pandas supports element-wise operations and functions:
- data[‘column’] + 10: Add 10 to each value in the column.
- data[‘column’].apply(function): Apply a custom function to each value.
# Add 10 to each value in a column
data['new_column'] = data['column_name'] + 10
# Apply a custom function
data['transformed'] = data['column_name'].apply(lambda x: x ** 2)
Aggregating and Grouping Data
Aggregate numerical data using grouping functions like groupby:
# Group by a category and calculate the mean
grouped_data = data.groupby('category_column')['numerical_column'].mean()
print(grouped_data)
Visualizing Numerical Data
Use Pandas’ built-in plotting capabilities or libraries like Matplotlib and Seaborn for visualization:
import matplotlib.pyplot as plt
# Plot histogram
data['numerical_column'].hist()
plt.show()
# Box plot
data.boxplot(column='numerical_column', by='category_column')
plt.show()