Pandas fillna: Complete Guide to Handling Missing Values

What is fillna?

The fillna() method is one of the most critical pandas functions for data cleaning. It replaces NaN (Not a Number) and missing values with specified values, methods, or strategies.

Why is this important?

  • Many pandas operations fail with missing values
  • Machine learning algorithms can’t handle NaN values
  • Data analysis becomes unreliable with incomplete data
  • fillna() is the primary solution for data imputation

Common use cases:

  • Fill missing ages with mean age
  • Fill missing values with previous observation (forward fill)
  • Fill missing values with next observation (backward fill)
  • Fill missing values with interpolated values (for time series)
  • Fill different columns with different values

Basic Syntax & Examples

Simple fillna with Scalar Value

The simplest way to fill missing values is with a single value:

import pandas as pd
import numpy as np# Create sample data with missing values
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, np.nan, 30, np.nan],
‘Salary’: [50000, 60000, np.nan, 70000]
})print(“Original DataFrame:”)
print(df)# Fill missing values with 0
df_filled = df.fillna(0)
print(“\nAfter fillna(0):”)
print(df_filled)

Output:

Original DataFrame:
Name Age Salary
0 Alice 25.0 50000
1 Bob NaN 60000
2 Charlie 30.0 NaN
3 David NaN 70000After fillna(0):
Name Age Salary
0 Alice 25.0 50000
1 Bob 0.0 60000
2 Charlie 30.0 0
3 David 0.0 70000
💡 Tip: fillna() returns a new DataFrame by default. Use inplace=True to modify the original.

Filling with Scalar Values

Fill All NaN with Same Value

# Fill all missing values with mean age
mean_age = df[‘Age’].mean()
df[‘Age’].fillna(mean_age, inplace=True)# Fill with string value
df[‘Name’].fillna(‘Unknown’, inplace=True)

Fill with Different Values per Column

# Fill different columns with different values
fill_values = {
‘Age’: df[‘Age’].mean(), # Mean age
‘Salary’: df[‘Salary’].median(), # Median salary
‘Name’: ‘Unknown’
}df_filled = df.fillna(fill_values)
print(df_filled)

Output:

Name Age Salary
0 Alice 25.00000 50000
1 Bob 27.50000 60000
2 Charlie 30.00000 70000
3 David 27.50000 70000
⚠️ Important: When using a dictionary, only columns in the dict are filled. Other columns remain unchanged.

Filling with Methods (ffill & bfill)

Forward Fill (ffill) – Propagate Last Value

Forward fill takes the last valid observation and propagates it forward:

df = pd.DataFrame({
‘Date’: [‘2024-01-01’, ‘2024-01-02’, ‘2024-01-03’, ‘2024-01-04’],
‘Status’: [‘Active’, np.nan, np.nan, ‘Inactive’]
})# Forward fill
df_ffill = df.fillna(method=’ffill’)
print(df_ffill)# Or use the shorthand
df_ffill = df.ffill() # Same result

Output:

Date Status
0 2024-01-01 Active
1 2024-01-02 Active # Filled from previous
2 2024-01-03 Active # Filled from previous
3 2024-01-04 Inactive

Backward Fill (bfill) – Propagate Next Value

Backward fill takes the next valid observation and propagates it backward:

# Backward fill
df_bfill = df.fillna(method=’bfill’)
print(df_bfill)# Or use the shorthand
df_bfill = df.bfill() # Same result

Output:

Date Status
0 2024-01-01 Active
1 2024-01-02 Inactive # Filled from next
2 2024-01-03 Inactive # Filled from next
3 2024-01-04 Inactive
💡 When to use: Forward fill is ideal for time series data where missing values should inherit the previous state. Backward fill is useful when you want to use future values.

Filling Column-Specific Values with Dictionary

Use a dictionary to fill different columns with different values:

df = pd.DataFrame({
‘Product’: [‘A’, np.nan, ‘C’, np.nan],
‘Price’: [100, np.nan, 300, 400],
‘Quantity’: [5, 10, np.nan, 20]
})# Fill with specific values per column
fill_dict = {
‘Product’: ‘Unknown Product’,
‘Price’: df[‘Price’].mean(),
‘Quantity’: 0
}df_filled = df.fillna(fill_dict)
print(df_filled)

Output:

Product Price Quantity
0 A 100.0 5
1 Unknown Product 200.0 10
2 C 300.0 0
3 Unknown Product 400.0 20

Advanced: Limit Fill with limit Parameter

The limit parameter controls how many consecutive NaN values to fill:

df = pd.DataFrame({
‘Value’: [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan]
})# Fill only first 2 NaN in forward direction
df_limited = df.fillna(method=’ffill’, limit=2)
print(df_limited)

Output:

Value
0 1.0
1 1.0 # Filled (limit count: 1)
2 1.0 # Filled (limit count: 2)
3 NaN # Not filled (limit exceeded)
4 5.0
5 5.0 # Filled (limit count: 1)
6 NaN # Not filled (limit exceeded)
💡 Use case: Limit is useful when you want to fill only small gaps but not large stretches of missing data.

Interpolation for Time Series Data

For numeric data with a logical progression, interpolation fills missing values based on a pattern:

df = pd.DataFrame({
‘Day’: [1, 2, 3, 4, 5],
‘Temperature’: [20, np.nan, np.nan, 35, 40]
})# Linear interpolation
df[‘Temperature’] = df[‘Temperature’].interpolate(method=’linear’)
print(df)

Output:

Day Temperature
0 1 20.00
1 2 23.75 # Interpolated
2 3 27.50 # Interpolated
3 4 35.00
4 5 40.00

Available interpolation methods:

Method Description Use Case
linear Straight line between points Most common, good default
polynomial Polynomial curve fitting Non-linear relationships
nearest Use nearest value Categorical-like data
quadratic Second-order polynomial Smooth curves

Inplace vs Copy

By default, fillna() returns a new DataFrame:

# Default: returns new DataFrame (original unchanged)
df_filled = df.fillna(0)# Original unchanged
print(df) # Still has NaN values# Inplace: modifies original DataFrame
df.fillna(0, inplace=True)
print(df) # NaN values are now 0

✅ Best Practices

  • Use inplace=False (default) – Safer, allows comparison before/after
  • Use inplace=True – When you’re sure and want to save memory
  • Always assign result – Even with inplace=False, reassign to be safe

Real-World Examples

Example 1: Customer Age and Income Data

import pandas as pddf = pd.DataFrame({
‘Customer’: [‘John’, ‘Jane’, ‘Bob’, ‘Alice’],
‘Age’: [25, np.nan, 35, np.nan],
‘Income’: [50000, 60000, np.nan, 80000]
})# Strategy: Use mean for age, median for income
df[‘Age’] = df[‘Age’].fillna(df[‘Age’].mean())
df[‘Income’] = df[‘Income’].fillna(df[‘Income’].median())print(df)

Example 2: Stock Price Time Series

df = pd.DataFrame({
‘Date’: pd.date_range(‘2024-01-01’, periods=7),
‘Price’: [100, np.nan, np.nan, 110, np.nan, 115, 120]
})# Forward fill for stock prices (assume price stays same until new data)
df[‘Price’] = df[‘Price’].ffill()print(df)

Example 3: Sensor Data with Interpolation

df = pd.DataFrame({
‘Hour’: range(6),
‘Humidity’: [60, np.nan, np.nan, 75, np.nan, 85]
})# Interpolate humidity values
df[‘Humidity’] = df[‘Humidity’].interpolate(method=’linear’)print(df)

Performance Tips & Best Practices

🚀 Performance Optimization

1. Use method parameter instead of loops

# ❌ SLOW – loops are slow in pandas
for col in df.columns:
df[col] = df[col].fillna(df[col].mean())# ✅ FAST – vectorized operation
df.fillna(df.mean(), inplace=True)

2. Use the appropriate fill method

# ✅ For time series (most efficient)
df.ffill()# ✅ For specific values (efficient)
df.fillna({‘col1’: 0, ‘col2’: ‘N/A’})# ❌ For complex logic (slower, use apply as last resort)
df.fillna(df.apply(custom_logic), inplace=True)

3. Fill in the right order

# Fill numeric columns first (faster)
df.fillna(df.mean(), inplace=True)# Then fill categorical columns
df.fillna(‘Unknown’, inplace=True)

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting to Assign Result

# ❌ WRONG – doesn’t modify df
df.fillna(0)# ✅ CORRECT – assign the result
df = df.fillna(0)# OR use inplace
df.fillna(0, inplace=True)

⚠️ Mistake #2: Filling with Inappropriate Values

# ❌ WRONG – filling age with 0 makes no sense
df[‘Age’].fillna(0)# ✅ CORRECT – use mean or median
df[‘Age’].fillna(df[‘Age’].mean())

⚠️ Mistake #3: Not Checking Fill Results

# Always verify the fill
print(df.isnull().sum()) # Check remaining NaN# Or use inplace=False to compare
df_filled = df.fillna(0)
print(f”Original NaN count: {df.isnull().sum().sum()}”)
print(f”Filled NaN count: {df_filled.isnull().sum().sum()}”)

Key Takeaways

fillna() is essential for data cleaning. Here’s what you now know:

  • Scalar Fill: Replace all NaN with a single value
  • Dictionary Fill: Fill different columns with different values
  • Forward/Backward Fill: Propagate values for time series
  • Interpolation: Fill based on mathematical patterns
  • Limit Parameter: Control how many consecutive values to fill
  • Inplace: Modify original DataFrame directly
  • Performance: Use vectorized operations, not loops

Next steps: Practice with your own datasets and choose the fill method that makes sense for your data type and analysis goals.

Leave a Reply