- Remove unnecessary columns to reduce DataFrame size
- Delete rows with specific index values
- Remove duplicate rows to ensure data uniqueness
- Eliminate rows based on conditions (values, NaN, etc.)
- Clean up temporary or helper columns
Key characteristics:
- Flexible: Works with row labels, column names, or positions
- Non-destructive: Returns new DataFrame by default (doesn’t modify original)
- Fast: Optimized for large datasets
- Safe: Can raise errors for missing labels (configurable)
Basic Syntax
import pandas as pd
import numpy as np# Create sample DataFrame
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’, ‘IT’],
‘Salary’: [50000, 75000, 60000, 55000, 70000],
‘Temp_Col’: [1, 2, 3, 4, 5]
})print(“Original DataFrame:”)
print(df)
import numpy as np# Create sample DataFrame
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’, ‘IT’],
‘Salary’: [50000, 75000, 60000, 55000, 70000],
‘Temp_Col’: [1, 2, 3, 4, 5]
})print(“Original DataFrame:”)
print(df)
Basic syntax:
# Drop rows
df.drop(index=[0, 2]) # or df.drop([0, 2])# Drop columns
df.drop(columns=[‘Temp_Col’]) # or df.drop(‘Temp_Col’, axis=1)# Drop with inplace
df.drop(‘Temp_Col’, axis=1, inplace=True)
df.drop(index=[0, 2]) # or df.drop([0, 2])# Drop columns
df.drop(columns=[‘Temp_Col’]) # or df.drop(‘Temp_Col’, axis=1)# Drop with inplace
df.drop(‘Temp_Col’, axis=1, inplace=True)
Drop Rows by Index
Drop Single Row
# Drop row at index 0
df_dropped = df.drop(0)
print(df_dropped)# Original unchanged
print(df) # Still has 5 rows
df_dropped = df.drop(0)
print(df_dropped)# Original unchanged
print(df) # Still has 5 rows
Drop Multiple Rows
# Drop rows with indices 0, 2, 4
df_dropped = df.drop([0, 2, 4])
print(df_dropped)# Result: Only rows 1 and 3 remain
df_dropped = df.drop([0, 2, 4])
print(df_dropped)# Result: Only rows 1 and 3 remain
Output:
Name Age Department Salary Temp_Col
1 Bob 30 IT 75000 2
3 David 40 Sales 55000 4
1 Bob 30 IT 75000 2
3 David 40 Sales 55000 4
π‘ Tip: By default, drop() returns a new DataFrame. Use
inplace=True to modify the original.Drop Columns by Name
Drop Single Column
# Drop single column – Method 1 (explicit)
df_dropped = df.drop(‘Temp_Col’, axis=1)# Drop single column – Method 2 (using columns parameter)
df_dropped = df.drop(columns=[‘Temp_Col’])# Drop in place
df.drop(‘Temp_Col’, axis=1, inplace=True)
df_dropped = df.drop(‘Temp_Col’, axis=1)# Drop single column – Method 2 (using columns parameter)
df_dropped = df.drop(columns=[‘Temp_Col’])# Drop in place
df.drop(‘Temp_Col’, axis=1, inplace=True)
Drop Multiple Columns
# Drop multiple columns at once
df_dropped = df.drop([‘Temp_Col’, ‘Age’], axis=1)
print(df_dropped)
df_dropped = df.drop([‘Temp_Col’, ‘Age’], axis=1)
print(df_dropped)
Output:
Name Department Salary
0 Alice Sales 50000
1 Bob IT 75000
2 Charlie HR 60000
3 David Sales 55000
4 Eve IT 70000
0 Alice Sales 50000
1 Bob IT 75000
2 Charlie HR 60000
3 David Sales 55000
4 Eve IT 70000
π‘ Best practice: Use
columns=['col_name'] for clarity. It’s more readable than axis=1.Drop by Label vs Position
By Label (Default)
# By default, drop uses labels
df.drop(0) # Drops row with index 0
df.drop(‘Age’) # Drops ‘Age’ column# Explicit: axis parameter
df.drop([0, 1], axis=0) # Rows with labels 0, 1
df.drop([‘Age’, ‘Name’], axis=1) # Columns named Age, Name
df.drop(0) # Drops row with index 0
df.drop(‘Age’) # Drops ‘Age’ column# Explicit: axis parameter
df.drop([0, 1], axis=0) # Rows with labels 0, 1
df.drop([‘Age’, ‘Name’], axis=1) # Columns named Age, Name
By Position with iloc-like approach
# Drop by position (need to convert index)
df_dropped = df.iloc[:, df.columns.get_loc(‘Temp_Col’) != -1]# Or use column numbers (not typical with drop())
positions_to_keep = [0, 1, 2, 4] # Keep these columns
df_dropped = df.iloc[:, positions_to_keep]
df_dropped = df.iloc[:, df.columns.get_loc(‘Temp_Col’) != -1]# Or use column numbers (not typical with drop())
positions_to_keep = [0, 1, 2, 4] # Keep these columns
df_dropped = df.iloc[:, positions_to_keep]
β οΈ Important: drop() works with labels/names, not positions. For position-based deletion, use slicing or iloc.
Drop Duplicate Rows
Drop All Duplicates
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Alice’, ‘David’, ‘Bob’],
‘Age’: [25, 30, 25, 40, 30]
})# Drop exact duplicates
df_unique = df.drop_duplicates()
print(df_unique)
‘Name’: [‘Alice’, ‘Bob’, ‘Alice’, ‘David’, ‘Bob’],
‘Age’: [25, 30, 25, 40, 30]
})# Drop exact duplicates
df_unique = df.drop_duplicates()
print(df_unique)
Output:
Name Age
0 Alice 25
1 Bob 30
3 David 40
0 Alice 25
1 Bob 30
3 David 40
Drop Duplicates by Specific Columns
# Keep first occurrence of each Name
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’first’)# Keep last occurrence
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’last’)# Drop all occurrences of duplicates (keep=False)
df_unique = df.drop_duplicates(subset=[‘Name’], keep=False)
print(df_unique) # Only David remains
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’first’)# Keep last occurrence
df_unique = df.drop_duplicates(subset=[‘Name’], keep=’last’)# Drop all occurrences of duplicates (keep=False)
df_unique = df.drop_duplicates(subset=[‘Name’], keep=False)
print(df_unique) # Only David remains
Output (keep=False):
Β Name Age
3 David 40
3 David 40
π‘ Parameters:
keep='first'– Keep first occurrence (default)keep='last'– Keep last occurrencekeep=False– Remove all duplicates
Conditional Row Dropping
Drop Rows with Condition
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Salary’: [50000, 75000, 60000, 55000, 70000]
})# Drop rows where Age > 30
condition = df[‘Age’] > 30
df_filtered = df.drop(df[condition].index)
print(df_filtered)
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’, ‘Eve’],
‘Age’: [25, 30, 35, 40, 28],
‘Salary’: [50000, 75000, 60000, 55000, 70000]
})# Drop rows where Age > 30
condition = df[‘Age’] > 30
df_filtered = df.drop(df[condition].index)
print(df_filtered)
Output:
Β Name Age Salary
0 Alice 25 50000
1 Bob 30 75000
4 Eve 28 70000
0 Alice 25 50000
1 Bob 30 75000
4 Eve 28 70000
Drop Rows with NaN Values
# Drop rows with any NaN
df_clean = df.dropna()# Drop rows where specific column has NaN
df_clean = df.dropna(subset=[‘Age’])# Drop rows where all values are NaN
df_clean = df.dropna(how=’all’)
df_clean = df.dropna()# Drop rows where specific column has NaN
df_clean = df.dropna(subset=[‘Age’])# Drop rows where all values are NaN
df_clean = df.dropna(how=’all’)
Drop Multiple Conditions
# Drop rows where Age > 30 AND Salary < 60000 condition = (df[‘Age’] > 30) & (df[‘Salary’] < 60000) df_filtered = df.drop(df[condition].index) # Drop rows where Age > 35 OR Name == ‘Bob’
condition = (df[‘Age’] > 35) | (df[‘Name’] == ‘Bob’)
df_filtered = df.drop(df[condition].index)
condition = (df[‘Age’] > 35) | (df[‘Name’] == ‘Bob’)
df_filtered = df.drop(df[condition].index)
Inplace vs Copy
Inplace=False (Default – Returns Copy)
# Returns new DataFrame, original unchanged
df_dropped = df.drop(0) # inplace=False is default
print(df) # Still has original 5 rows
print(df_dropped) # Has 4 rows
df_dropped = df.drop(0) # inplace=False is default
print(df) # Still has original 5 rows
print(df_dropped) # Has 4 rows
Inplace=True (Modifies Original)
# Modifies original DataFrame in place
df.drop(0, inplace=True)
print(df) # Now has 4 rows# Original is gone unless you saved it first
# df_backup = df.copy() # Smart practice
df.drop(0, inplace=True)
print(df) # Now has 4 rows# Original is gone unless you saved it first
# df_backup = df.copy() # Smart practice
β Best Practices
- Default (inplace=False): Safer, allows comparison
- Use inplace=True: When processing large files and memory matters
- Always backup: Before using inplace=True
- Chain operations: df.drop(…).drop(…) works with default
Error Handling
Handle Missing Labels
# This raises KeyError if index 99 doesn’t exist
try:
df.drop(99)
except KeyError:
print(“Index 99 not found”)# Use errors parameter to ignore missing labels
df_dropped = df.drop([0, 99], errors=’ignore’)
print(df_dropped) # Drops 0, ignores 99
try:
df.drop(99)
except KeyError:
print(“Index 99 not found”)# Use errors parameter to ignore missing labels
df_dropped = df.drop([0, 99], errors=’ignore’)
print(df_dropped) # Drops 0, ignores 99
Drop Non-Existent Columns Safely
# Will raise error if column doesn’t exist
df.drop(‘NonExistent’, axis=1) # KeyError!# Use errors=’ignore’ to skip missing columns
df_dropped = df.drop([‘Age’, ‘NonExistent’], axis=1, errors=’ignore’)
print(df_dropped) # Drops Age, ignores NonExistent
df.drop(‘NonExistent’, axis=1) # KeyError!# Use errors=’ignore’ to skip missing columns
df_dropped = df.drop([‘Age’, ‘NonExistent’], axis=1, errors=’ignore’)
print(df_dropped) # Drops Age, ignores NonExistent
π‘ Always use errors=’ignore’: When dropping columns you’re not 100% sure exist.
Performance Optimization
π Optimize Performance
1. Drop Multiple Items in One Call
# β
FAST – single operation
df.drop([0, 1, 2, 3, 4], axis=0)# β SLOWER – multiple operations
df.drop(0).drop(1).drop(2)
df.drop([0, 1, 2, 3, 4], axis=0)# β SLOWER – multiple operations
df.drop(0).drop(1).drop(2)
2. Use drop_duplicates() Instead of drop()
# β
FAST – specialized method
df.drop_duplicates()# β SLOWER – manual duplicate detection
duplicates = df.duplicated()
df.drop(df[duplicates].index)
df.drop_duplicates()# β SLOWER – manual duplicate detection
duplicates = df.duplicated()
df.drop(df[duplicates].index)
3. For Large Datasets, Use dropna() Instead
# β
FAST – optimized method
df.dropna()# β SLOWER – conditional drop
df.drop(df[df.isnull().any(axis=1)].index)
df.dropna()# β SLOWER – conditional drop
df.drop(df[df.isnull().any(axis=1)].index)
4. Avoid Chaining When Processing Large Data
# β
FAST – single inplace operation
df.drop([0, 1, 2], inplace=True)# β SLOWER – creates intermediate copies
df = df.drop([0, 1, 2]) # Memory intensive
df.drop([0, 1, 2], inplace=True)# β SLOWER – creates intermediate copies
df = df.drop([0, 1, 2]) # Memory intensive
Common Mistakes to Avoid
β οΈ Mistake #1: Forgetting to Assign Result
# β WRONG – doesn’t modify df
df.drop(0)# β CORRECT – assign the result
df = df.drop(0)# OR use inplace
df.drop(0, inplace=True)
df.drop(0)# β CORRECT – assign the result
df = df.drop(0)# OR use inplace
df.drop(0, inplace=True)
β οΈ Mistake #2: Mixing axis Parameter and columns
# β WRONG – axis=1 and columns together
df.drop(‘Age’, axis=1, columns=[‘Age’])# β CORRECT – use one approach
df.drop(‘Age’, axis=1)
# OR
df.drop(columns=[‘Age’])
df.drop(‘Age’, axis=1, columns=[‘Age’])# β CORRECT – use one approach
df.drop(‘Age’, axis=1)
# OR
df.drop(columns=[‘Age’])
β οΈ Mistake #3: Not Handling Missing Indices
# β WRONG – KeyError if indices don’t exist
df.drop([0, 100, 200])# β CORRECT – use errors=’ignore’
df.drop([0, 100, 200], errors=’ignore’)
df.drop([0, 100, 200])# β CORRECT – use errors=’ignore’
df.drop([0, 100, 200], errors=’ignore’)
β οΈ Mistake #4: Using drop() for Position-Based Deletion
# β WRONG – drop() uses labels, not positions
df.drop([0, 1, 2]) # Assumes 0,1,2 are index labels# β CORRECT – use iloc for positions
df.iloc[3:].reset_index(drop=True) # Drop first 3 rows
df.drop([0, 1, 2]) # Assumes 0,1,2 are index labels# β CORRECT – use iloc for positions
df.iloc[3:].reset_index(drop=True) # Drop first 3 rows
Key Takeaways
You now understand drop() comprehensively:
- Drop rows: By index labels using drop()
- Drop columns: By name using axis=1 or columns parameter
- Drop duplicates: Using drop_duplicates() with keep parameter
- Conditional drops: Combine with loc/filtering
- Error handling: Use errors=’ignore’ for safety
- Performance: Batch operations, use specialized methods
Next step: Practice dropping data from your own datasets to master this essential pandas skill!
