Pandas drop: Remove Rows and Columns Complete Guide

The drop() method is pandas’ primary tool for removing rows or columns from a DataFrame. It’s essential for data cleaning when you need to eliminate unwanted data.Common use cases:

  • Remove unnecessary columns to reduce DataFrame size
  • Delete rows with specific index values
  • Remove duplicate rows to ensure data uniqueness
  • Eliminate rows based on conditions (values, NaN, etc.)
  • Clean up temporary or helper columns

Key characteristics:

  • Flexible: Works with row labels, column names, or positions
  • Non-destructive: Returns new DataFrame by default (doesn’t modify original)
  • Fast: Optimized for large datasets
  • Safe: Can raise errors for missing labels (configurable)

Basic Syntax

import pandas as pd
import numpy as np# Create sample DataFrame
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’, ‘IT’],
‘Salary’: [50000, 75000, 60000, 55000, 70000],
‘Temp_Col’: [1, 2, 3, 4, 5]
})print(“Original DataFrame:”)
print(df)

Basic syntax:

# Drop rows
df.drop(index=[0, 2]) # or df.drop([0, 2])# Drop columns
df.drop(columns=[‘Temp_Col’]) # or df.drop(‘Temp_Col’, axis=1)# Drop with inplace
df.drop(‘Temp_Col’, axis=1, inplace=True)

Drop Rows by Index

Drop Single Row

# Drop row at index 0
df_dropped = df.drop(0)
print(df_dropped)# Original unchanged
print(df) # Still has 5 rows

Drop Multiple Rows

# Drop rows with indices 0, 2, 4
df_dropped = df.drop([0, 2, 4])
print(df_dropped)# Result: Only rows 1 and 3 remain

Output:

Name Age Department Salary Temp_Col
1 Bob 30 IT 75000 2
3 David 40 Sales 55000 4
πŸ’‘ Tip: By default, drop() returns a new DataFrame. Use inplace=True to modify the original.

Drop Columns by Name

Drop Single Column

# Drop single column – Method 1 (explicit)
df_dropped = df.drop(‘Temp_Col’, axis=1)# Drop single column – Method 2 (using columns parameter)
df_dropped = df.drop(columns=[‘Temp_Col’])# Drop in place
df.drop(‘Temp_Col’, axis=1, inplace=True)

Drop Multiple Columns

# Drop multiple columns at once
df_dropped = df.drop([‘Temp_Col’, ‘Age’], axis=1)
print(df_dropped)

Output:

Name Department Salary
0 Alice Sales 50000
1 Bob IT 75000
2 Charlie HR 60000
3 David Sales 55000
4 Eve IT 70000
πŸ’‘ Best practice: Use columns=['col_name'] for clarity. It’s more readable than axis=1.

Drop by Label vs Position

By Label (Default)

# By default, drop uses labels
df.drop(0) # Drops row with index 0
df.drop(‘Age’) # Drops ‘Age’ column# Explicit: axis parameter
df.drop([0, 1], axis=0) # Rows with labels 0, 1
df.drop([‘Age’, ‘Name’], axis=1) # Columns named Age, Name

By Position with iloc-like approach

# Drop by position (need to convert index)
df_dropped = df.iloc[:, df.columns.get_loc(‘Temp_Col’) != -1]# Or use column numbers (not typical with drop())
positions_to_keep = [0, 1, 2, 4] # Keep these columns
df_dropped = df.iloc[:, positions_to_keep]
⚠️ Important: drop() works with labels/names, not positions. For position-based deletion, use slicing or iloc.

Drop Duplicate Rows

Drop All Duplicates

df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Alice’, ‘David’, ‘Bob’],
‘Age’: [25, 30, 25, 40, 30]
})# Drop exact duplicates
df_unique = df.drop_duplicates()
print(df_unique)

Output:

Name Age
0 Alice 25
1 Bob 30
3 David 40

Drop Duplicates by Specific Columns

# Keep first occurrence of each Name
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’first’)# Keep last occurrence
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’last’)# Drop all occurrences of duplicates (keep=False)
df_unique = df.drop_duplicates(subset=[‘Name’], keep=False)
print(df_unique) # Only David remains

Output (keep=False):

Β Name Age
3 David 40

πŸ’‘ Parameters:

  • keep='first' – Keep first occurrence (default)
  • keep='last' – Keep last occurrence
  • keep=False – Remove all duplicates

Conditional Row Dropping

Drop Rows with Condition

df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Salary’: [50000, 75000, 60000, 55000, 70000]
})# Drop rows where Age > 30
condition = df[‘Age’] > 30
df_filtered = df.drop(df[condition].index)
print(df_filtered)

Output:

Β Name Age Salary
0 Alice 25 50000
1 Bob 30 75000
4 Eve 28 70000

Drop Rows with NaN Values

# Drop rows with any NaN
df_clean = df.dropna()# Drop rows where specific column has NaN
df_clean = df.dropna(subset=[‘Age’])# Drop rows where all values are NaN
df_clean = df.dropna(how=’all’)

Drop Multiple Conditions

# Drop rows where Age > 30 AND Salary < 60000 condition = (df[‘Age’] > 30) & (df[‘Salary’] < 60000) df_filtered = df.drop(df[condition].index) # Drop rows where Age > 35 OR Name == ‘Bob’
condition = (df[‘Age’] > 35) | (df[‘Name’] == ‘Bob’)
df_filtered = df.drop(df[condition].index)

Inplace vs Copy

Inplace=False (Default – Returns Copy)

# Returns new DataFrame, original unchanged
df_dropped = df.drop(0) # inplace=False is default
print(df) # Still has original 5 rows
print(df_dropped) # Has 4 rows

Inplace=True (Modifies Original)

# Modifies original DataFrame in place
df.drop(0, inplace=True)
print(df) # Now has 4 rows# Original is gone unless you saved it first
# df_backup = df.copy() # Smart practice

βœ… Best Practices

  • Default (inplace=False): Safer, allows comparison
  • Use inplace=True: When processing large files and memory matters
  • Always backup: Before using inplace=True
  • Chain operations: df.drop(…).drop(…) works with default

Error Handling

Handle Missing Labels

# This raises KeyError if index 99 doesn’t exist
try:
df.drop(99)
except KeyError:
print(“Index 99 not found”)# Use errors parameter to ignore missing labels
df_dropped = df.drop([0, 99], errors=’ignore’)
print(df_dropped) # Drops 0, ignores 99

Drop Non-Existent Columns Safely

# Will raise error if column doesn’t exist
df.drop(‘NonExistent’, axis=1) # KeyError!# Use errors=’ignore’ to skip missing columns
df_dropped = df.drop([‘Age’, ‘NonExistent’], axis=1, errors=’ignore’)
print(df_dropped) # Drops Age, ignores NonExistent
πŸ’‘ Always use errors=’ignore’: When dropping columns you’re not 100% sure exist.

Performance Optimization

πŸš€ Optimize Performance

1. Drop Multiple Items in One Call

# βœ… FAST – single operation
df.drop([0, 1, 2, 3, 4], axis=0)# ❌ SLOWER – multiple operations
df.drop(0).drop(1).drop(2)

2. Use drop_duplicates() Instead of drop()

# βœ… FAST – specialized method
df.drop_duplicates()# ❌ SLOWER – manual duplicate detection
duplicates = df.duplicated()
df.drop(df[duplicates].index)

3. For Large Datasets, Use dropna() Instead

# βœ… FAST – optimized method
df.dropna()# ❌ SLOWER – conditional drop
df.drop(df[df.isnull().any(axis=1)].index)

4. Avoid Chaining When Processing Large Data

# βœ… FAST – single inplace operation
df.drop([0, 1, 2], inplace=True)# ❌ SLOWER – creates intermediate copies
df = df.drop([0, 1, 2]) # Memory intensive

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting to Assign Result

# ❌ WRONG – doesn’t modify df
df.drop(0)# βœ… CORRECT – assign the result
df = df.drop(0)# OR use inplace
df.drop(0, inplace=True)

⚠️ Mistake #2: Mixing axis Parameter and columns

# ❌ WRONG – axis=1 and columns together
df.drop(‘Age’, axis=1, columns=[‘Age’])# βœ… CORRECT – use one approach
df.drop(‘Age’, axis=1)
# OR
df.drop(columns=[‘Age’])

⚠️ Mistake #3: Not Handling Missing Indices

# ❌ WRONG – KeyError if indices don’t exist
df.drop([0, 100, 200])# βœ… CORRECT – use errors=’ignore’
df.drop([0, 100, 200], errors=’ignore’)

⚠️ Mistake #4: Using drop() for Position-Based Deletion

# ❌ WRONG – drop() uses labels, not positions
df.drop([0, 1, 2]) # Assumes 0,1,2 are index labels# βœ… CORRECT – use iloc for positions
df.iloc[3:].reset_index(drop=True) # Drop first 3 rows

Key Takeaways

You now understand drop() comprehensively:

  • Drop rows: By index labels using drop()
  • Drop columns: By name using axis=1 or columns parameter
  • Drop duplicates: Using drop_duplicates() with keep parameter
  • Conditional drops: Combine with loc/filtering
  • Error handling: Use errors=’ignore’ for safety
  • Performance: Batch operations, use specialized methods

Next step: Practice dropping data from your own datasets to master this essential pandas skill!

πŸ“š Learn more pandas tutorials at Pandas How-To – Your complete guide to data analysis in Python

Related articles: fillna, Data Cleaning, dropna, Data Validation

Leave a Reply