Cracking time series forecasting with pandas is like finding a map to hidden treasures in your data. Let’s chart the course.
Time Series Basics in Pandas
Pandas shines with time series data, thanks to its DateTimeIndex. Here’s how you start:
import pandas as pd # Creating a time series DataFrame dates = pd.date_range('20230101', periods=6) df = pd.DataFrame({'Sales': [200, 250, 300, 275, 225, 305]}, index=dates) print(df)
This snippet sets you up with a simple sales dataset, indexed by date.
Rolling Windows for Smoothing
Smoothing out the noise with rolling windows helps see the bigger picture:
# Calculate rolling average rolling_avg = df.rolling(window=3).mean() print(rolling_avg)
Resampling for Frequency Conversion
Need monthly data instead of daily? Resampling’s got your back:
# Resample to monthly data and sum up monthly_sum = df.resample('M').sum() print(monthly_sum)
Forecasting with ARIMA
For the actual forecasting, you’ll often leave pandas land and use statsmodels, particularly ARIMA, which fits well with pandas DataFrames:
from statsmodels.tsa.arima.model import ARIMA # Fit the ARIMA model model = ARIMA(df, order=(1, 1, 1)) model_fit = model.fit() # Forecast forecast = model_fit.forecast(steps=3) print(forecast)
Pandas doesn’t directly do the forecasting but sets you up perfectly to feed your data into powerful forecasting models like ARIMA.