Pandas Fillna: Complete Guide To Handling Missing Values

What is fillna?

The fillna() method is one of the most critical pandas functions for data cleaning. It replaces NaN (Not a Number) and missing values with specified values, methods, or strategies.

Why is this important?

Many pandas operations fail with missing values
Machine learning algorithms can’t handle NaN values
Data analysis becomes unreliable with incomplete data
fillna() is the primary solution for data imputation

Common use cases:

Fill missing ages with mean age
Fill missing values with previous observation (forward fill)
Fill missing values with next observation (backward fill)
Fill missing values with interpolated values (for time series)
Fill different columns with different values

Basic Syntax & Examples

Simple fillna with Scalar Value

The simplest way to fill missing values is with a single value:

import pandas as pd
import numpy as np# Create sample data with missing values
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, np.nan, 30, np.nan],
‘Salary’: [50000, 60000, np.nan, 70000]
})print(“Original DataFrame:”)
print(df)# Fill missing values with 0
df_filled = df.fillna(0)
print(“\nAfter fillna(0):”)
print(df_filled)

Output:

Original DataFrame:
Name Age Salary
0 Alice 25.0 50000
1 Bob NaN 60000
2 Charlie 30.0 NaN
3 David NaN 70000After fillna(0):
Name Age Salary
0 Alice 25.0 50000
1 Bob 0.0 60000
2 Charlie 30.0 0
3 David 0.0 70000

💡 Tip: fillna() returns a new DataFrame by default. Use inplace=True to modify the original.

Filling with Scalar Values

Fill All NaN with Same Value

# Fill all missing values with mean age
mean_age = df[‘Age’].mean()
df[‘Age’].fillna(mean_age, inplace=True)# Fill with string value
df[‘Name’].fillna(‘Unknown’, inplace=True)

Fill with Different Values per Column

# Fill different columns with different values
fill_values = {
‘Age’: df[‘Age’].mean(), # Mean age
‘Salary’: df[‘Salary’].median(), # Median salary
‘Name’: ‘Unknown’
}df_filled = df.fillna(fill_values)
print(df_filled)

Output:

Name Age Salary
0 Alice 25.00000 50000
1 Bob 27.50000 60000
2 Charlie 30.00000 70000
3 David 27.50000 70000

⚠️ Important: When using a dictionary, only columns in the dict are filled. Other columns remain unchanged.

Filling with Methods (ffill & bfill)

Forward Fill (ffill) – Propagate Last Value

Forward fill takes the last valid observation and propagates it forward:

df = pd.DataFrame({
‘Date’: [‘2024-01-01’, ‘2024-01-02’, ‘2024-01-03’, ‘2024-01-04’],
‘Status’: [‘Active’, np.nan, np.nan, ‘Inactive’]
})# Forward fill
df_ffill = df.fillna(method=’ffill’)
print(df_ffill)# Or use the shorthand
df_ffill = df.ffill() # Same result

Output:

Date Status
0 2024-01-01 Active
1 2024-01-02 Active # Filled from previous
2 2024-01-03 Active # Filled from previous
3 2024-01-04 Inactive

Backward Fill (bfill) – Propagate Next Value

Backward fill takes the next valid observation and propagates it backward:

# Backward fill

df_bfill = df.fillna(method=’bfill’)

print(df_bfill)# Or use the shorthand

df_bfill = df.bfill() # Same result

Output:

Date Status
0 2024-01-01 Active
1 2024-01-02 Inactive # Filled from next
2 2024-01-03 Inactive # Filled from next
3 2024-01-04 Inactive

💡 When to use: Forward fill is ideal for time series data where missing values should inherit the previous state. Backward fill is useful when you want to use future values.

Filling Column-Specific Values with Dictionary

Use a dictionary to fill different columns with different values:

df = pd.DataFrame({
‘Product’: [‘A’, np.nan, ‘C’, np.nan],
‘Price’: [100, np.nan, 300, 400],
‘Quantity’: [5, 10, np.nan, 20]
})# Fill with specific values per column
fill_dict = {
‘Product’: ‘Unknown Product’,
‘Price’: df[‘Price’].mean(),
‘Quantity’: 0
}df_filled = df.fillna(fill_dict)
print(df_filled)

Output:

Product Price Quantity
0 A 100.0 5
1 Unknown Product 200.0 10
2 C 300.0 0
3 Unknown Product 400.0 20

Advanced: Limit Fill with limit Parameter

The limit parameter controls how many consecutive NaN values to fill:

df = pd.DataFrame({
‘Value’: [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan]
})# Fill only first 2 NaN in forward direction
df_limited = df.fillna(method=’ffill’, limit=2)
print(df_limited)

Output:

Value
0 1.0
1 1.0 # Filled (limit count: 1)
2 1.0 # Filled (limit count: 2)
3 NaN # Not filled (limit exceeded)
4 5.0
5 5.0 # Filled (limit count: 1)
6 NaN # Not filled (limit exceeded)

💡 Use case: Limit is useful when you want to fill only small gaps but not large stretches of missing data.

Interpolation for Time Series Data

For numeric data with a logical progression, interpolation fills missing values based on a pattern:

df = pd.DataFrame({
‘Day’: [1, 2, 3, 4, 5],
‘Temperature’: [20, np.nan, np.nan, 35, 40]
})# Linear interpolation
df[‘Temperature’] = df[‘Temperature’].interpolate(method=’linear’)
print(df)

Output:

Day Temperature

1 20.00

2 23.75 # Interpolated

3 27.50 # Interpolated

4 35.00

5 40.00

Available interpolation methods:

Method	Description	Use Case
linear	Straight line between points	Most common, good default
polynomial	Polynomial curve fitting	Non-linear relationships
nearest	Use nearest value	Categorical-like data
quadratic	Second-order polynomial	Smooth curves

Inplace vs Copy

By default, fillna() returns a new DataFrame:

# Default: returns new DataFrame (original unchanged)

df_filled = df.fillna(0)# Original unchanged

print(df) # Still has NaN values# Inplace: modifies original DataFrame

df.fillna(0, inplace=True)

print(df) # NaN values are now 0

✅ Best Practices

Use inplace=False (default) – Safer, allows comparison before/after
Use inplace=True – When you’re sure and want to save memory
Always assign result – Even with inplace=False, reassign to be safe

Real-World Examples

Example 1: Customer Age and Income Data

import pandas as pddf = pd.DataFrame({

‘Customer’: [‘John’, ‘Jane’, ‘Bob’, ‘Alice’],

‘Age’: [25, np.nan, 35, np.nan],

‘Income’: [50000, 60000, np.nan, 80000]

})# Strategy: Use mean for age, median for income

df[‘Age’] = df[‘Age’].fillna(df[‘Age’].mean())

df[‘Income’] = df[‘Income’].fillna(df[‘Income’].median())print(df)

Example 2: Stock Price Time Series

df = pd.DataFrame({

‘Date’: pd.date_range(‘2024-01-01’, periods=7),

‘Price’: [100, np.nan, np.nan, 110, np.nan, 115, 120]

})# Forward fill for stock prices (assume price stays same until new data)

df[‘Price’] = df[‘Price’].ffill()print(df)

Example 3: Sensor Data with Interpolation

df = pd.DataFrame({

‘Hour’: range(6),

‘Humidity’: [60, np.nan, np.nan, 75, np.nan, 85]

})# Interpolate humidity values

df[‘Humidity’] = df[‘Humidity’].interpolate(method=’linear’)print(df)

Performance Tips & Best Practices

🚀 Performance Optimization

1. Use method parameter instead of loops

# ❌ SLOW – loops are slow in pandas

for col in df.columns:

df[col] = df[col].fillna(df[col].mean())# ✅ FAST – vectorized operation

df.fillna(df.mean(), inplace=True)

2. Use the appropriate fill method

# ✅ For time series (most efficient)

df.ffill()# ✅ For specific values (efficient)

df.fillna({‘col1’: 0, ‘col2’: ‘N/A’})# ❌ For complex logic (slower, use apply as last resort)

df.fillna(df.apply(custom_logic), inplace=True)

3. Fill in the right order

# Fill numeric columns first (faster)

df.fillna(df.mean(), inplace=True)# Then fill categorical columns

df.fillna(‘Unknown’, inplace=True)

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting to Assign Result

# ❌ WRONG – doesn’t modify df

df.fillna(0)# ✅ CORRECT – assign the result

df = df.fillna(0)# OR use inplace

df.fillna(0, inplace=True)

⚠️ Mistake #2: Filling with Inappropriate Values

# ❌ WRONG – filling age with 0 makes no sense

df[‘Age’].fillna(0)# ✅ CORRECT – use mean or median

df[‘Age’].fillna(df[‘Age’].mean())

⚠️ Mistake #3: Not Checking Fill Results

# Always verify the fill

print(df.isnull().sum()) # Check remaining NaN# Or use inplace=False to compare

df_filled = df.fillna(0)

print(f”Original NaN count: {df.isnull().sum().sum()}”)

print(f”Filled NaN count: {df_filled.isnull().sum().sum()}”)

Key Takeaways

fillna() is essential for data cleaning. Here’s what you now know:

Scalar Fill: Replace all NaN with a single value
Dictionary Fill: Fill different columns with different values
Forward/Backward Fill: Propagate values for time series
Interpolation: Fill based on mathematical patterns
Limit Parameter: Control how many consecutive values to fill
Inplace: Modify original DataFrame directly
Performance: Use vectorized operations, not loops

Next steps: Practice with your own datasets and choose the fill method that makes sense for your data type and analysis goals.

What is fillna?

Basic Syntax & Examples

Simple fillna with Scalar Value

Filling with Scalar Values

Fill All NaN with Same Value

Fill with Different Values per Column

Filling with Methods (ffill & bfill)

Forward Fill (ffill) – Propagate Last Value

Backward Fill (bfill) – Propagate Next Value

Filling Column-Specific Values with Dictionary

Advanced: Limit Fill with limit Parameter

Interpolation for Time Series Data

Inplace vs Copy

✅ Best Practices

Real-World Examples

Example 1: Customer Age and Income Data

Example 2: Stock Price Time Series

Example 3: Sensor Data with Interpolation

Performance Tips & Best Practices

🚀 Performance Optimization

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting to Assign Result

⚠️ Mistake #2: Filling with Inappropriate Values

⚠️ Mistake #3: Not Checking Fill Results

Key Takeaways

Leave a Reply Cancel reply

What is fillna?

Basic Syntax & Examples

Simple fillna with Scalar Value

Filling with Scalar Values

Fill All NaN with Same Value

Fill with Different Values per Column

Filling with Methods (ffill & bfill)

Forward Fill (ffill) – Propagate Last Value

Backward Fill (bfill) – Propagate Next Value

Filling Column-Specific Values with Dictionary

Advanced: Limit Fill with limit Parameter

Interpolation for Time Series Data

Inplace vs Copy

✅ Best Practices

Real-World Examples

Example 1: Customer Age and Income Data

Example 2: Stock Price Time Series

Example 3: Sensor Data with Interpolation

Performance Tips & Best Practices

🚀 Performance Optimization

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting to Assign Result

⚠️ Mistake #2: Filling with Inappropriate Values

⚠️ Mistake #3: Not Checking Fill Results

Key Takeaways

Related posts:

You Might Also Like

How to replace nan by mean in Pandas

Casting to String in Pandas

How to Remove Values Above Threshold in Pandas

Leave a Reply Cancel reply