What is loc?
loc is a pandas accessor for label-based indexing and selection. It’s one of the most powerful tools for working with DataFrames because it allows you to access data using labels (row and column names) instead of numeric positions.
Why use loc instead of direct indexing?
- Works with any index type (integers, strings, dates, etc.)
- Supports boolean indexing for conditional selection
- Allows range slicing by labels (inclusive on both ends)
- More readable and maintainable code
- Essential for complex filtering operations
Key characteristics:
- Label-based: Uses row/column names, not positions
- Inclusive: Both start and end are included in slices
- Flexible: Works with scalars, lists, slices, and boolean arrays
- Fast: Optimized for large datasets
Syntax and Basic Usage
Basic Syntax
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, 30, 35, 40],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’],
‘Salary’: [50000, 75000, 60000, 55000]
}, index=[‘E001’, ‘E002’, ‘E003’, ‘E004’])print(df)# Basic loc syntax:
# df.loc[row_indexer, column_indexer]
Output:
E001 Alice 25 Sales 50000
E002 Bob 30 IT 75000
E003 Charlie 35 HR 60000
E004 David 40 Sales 55000
Selecting Single Rows
Select One Row by Label
row = df.loc[‘E002’]
print(row)
print(type(row)) # pandas.Series
Output:
Age 30
Department IT
Salary 75000
Name: E002, dtype: object
Select Specific Column from Row
name = df.loc[‘E002’, ‘Name’]
print(name) # Output: Bobsalary = df.loc[‘E001’, ‘Salary’]
print(salary) # Output: 50000
Selecting Multiple Rows
Select Multiple Rows by List
selected = df.loc[[‘E001’, ‘E003’]]
print(selected)
Output:
E001 Alice 25 Sales 50000
E003 Charlie 35 HR 60000
Select Range of Rows (Slice)
Important: loc slices are inclusive on both ends, unlike Python slicing!
selected = df.loc[‘E001′:’E003’]
print(selected)
Output:
E001 Alice 25 Sales 50000
E002 Bob 30 IT 75000
E003 Charlie 35 HR 60000
Selecting Columns
Select Single Column
names = df.loc[:, ‘Name’]
print(names)
print(type(names)) # pandas.Series
Output:
E002 Bob
E003 Charlie
E004 David
Name: Name, dtype: object
Select Multiple Columns
cols = df.loc[:, [‘Name’, ‘Department’]]
print(cols)
Output:
E001 Alice Sales
E002 Bob IT
E003 Charlie HR
E004 David Sales
Select Range of Columns
cols = df.loc[:, ‘Age’:’Salary’]
print(cols)
Output:
E001 25 Sales 50000
E002 30 IT 75000
E003 35 HR 60000
E004 40 Sales 55000
df.loc[:, 'Name'] means “all rows, ‘Name’ column”.Conditional Filtering with loc
One of loc’s most powerful features is conditional filtering using boolean masks:
Filter by Single Condition
Output:
E003 Charlie 35 HR 60000
E004 David 40 Sales 55000
Filter by Multiple Conditions
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Department’] == ‘Sales’)]
print(filtered)
Output:
E004 David 40 Sales 55000
Filter with OR Condition
filtered = df.loc[(df[‘Department’] == ‘Sales’) | (df[‘Department’] == ‘HR’)]
print(filtered)
& for AND and | for OR. Don’t use and or or keywords!Filter with isin()
filtered = df.loc[df[‘Department’].isin([‘Sales’, ‘HR’])]
print(filtered)
isin() for checking membership in multiple values.loc vs iloc: Key Differences
Understanding the difference between loc and iloc is crucial:
| Feature | loc | iloc |
|---|---|---|
| Type | Label-based | Position-based |
| Indexing | Uses row/column names/labels | Uses integer positions (0, 1, 2…) |
| Slice endpoint | Inclusive (includes end) | Exclusive (excludes end) |
| Boolean indexing | ✓ Supported | ✗ Not directly supported |
| Example | df.loc['E002', 'Name'] |
df.iloc[1, 0] |
Practical Comparison
print(df.loc[‘E001′:’E003’]) # Includes E003# Using iloc (position-based)
print(df.iloc[0:3]) # Includes position 0, 1, 2 (same result, but method differs)# Different results with strings
print(df.loc[‘E001′:’E002’]) # Includes both E001 and E002
print(df.iloc[0:2]) # Includes positions 0 and 1 (same result again)
✅ When to use each:
- Use loc: When you know row/column labels (most common)
- Use iloc: When you need position-based access (e.g., first 5 rows)
- Use loc: For conditional filtering (Boolean indexing)
- Use iloc: When working with numeric row positions
Assignment with loc
Update Single Cell
df.loc[‘E001’, ‘Salary’] = 55000
print(df.loc[‘E001’])
Update Multiple Cells in Column
df.loc[df[‘Department’] == ‘IT’, ‘Salary’] += 5000
print(df)
Update Multiple Columns
df.loc[‘E003’, [‘Age’, ‘Salary’]] = [36, 65000]
print(df.loc[‘E003’])
Advanced Techniques
Complex Boolean Filtering
condition = (df[‘Department’] == ‘Sales’) & (df[‘Salary’] > 52000)
filtered = df.loc[condition]
print(filtered)
Using isnull() and notnull()
missing = df.loc[df[‘Salary’].isnull()]# Find rows without missing values
present = df.loc[df[‘Salary’].notnull()]
String Matching with str accessor
filtered = df.loc[df[‘Name’].str.contains(‘a’, case=False)]
print(filtered)
Combining Multiple Conditions
(df[‘Salary’] < 70000) & \
(df[‘Department’].isin([‘Sales’, ‘HR’]))
filtered = df.loc[condition]
print(filtered)
Performance Considerations
🚀 Optimize Performance
1. Use loc for large DataFrames
filtered = df.loc[df[‘Age’] > 30]# ❌ SLOWER – iterating through rows
for idx, row in df.iterrows():
if row[‘Age’] > 30:
print(row)
2. Combine multiple conditions efficiently
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]# ❌ SLOWER – multiple separate filters
filtered = df.loc[df[‘Age’] > 30]
filtered = filtered.loc[filtered[‘Salary’] > 50000]
3. Use loc instead of at for loops
df.loc[df[‘Department’] == ‘IT’, ‘Salary’] *= 1.1# ❌ SLOWER – iterating with at accessor
for idx in df.index:
if df.at[idx, ‘Department’] == ‘IT’:
df.at[idx, ‘Salary’] *= 1.1
Common Mistakes to Avoid
⚠️ Mistake #1: Forgetting Parentheses in Boolean Conditions
filtered = df.loc[df[‘Age’] > 30 & df[‘Salary’] > 50000]# ✅ CORRECT – parentheses are required
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]
⚠️ Mistake #2: Using ‘and’ or ‘or’ instead of ‘&’ or ‘|’
filtered = df.loc[df[‘Age’] > 30 and df[‘Salary’] > 50000]# ✅ CORRECT – use & and |
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]
⚠️ Mistake #3: Forgetting the Colon for “All Rows”
value = df.loc[‘E001’, ‘Name’]# ✅ CORRECT – for all rows
column = df.loc[:, ‘Name’]
⚠️ Mistake #4: Not Handling KeyError for Missing Labels
row = df.loc[‘E999’]# ✅ CORRECT – use get or check first
row = df.loc[df.index.isin([‘E999’])] # Returns empty DataFrame if not found
Key Takeaways
You now understand how to use loc for powerful data selection:
- Label-based indexing: Access data using row/column names
- Inclusive slicing: Both start and end are included
- Boolean filtering: Conditional selection with multiple conditions
- Assignment: Safely update DataFrames using loc
- Performance: Vectorized loc operations are faster than loops
- loc vs iloc: Use loc for labels, iloc for positions
Next step: Practice with your own DataFrames and experiment with different filtering conditions to master this essential pandas skill!
