Pandas loc: Label-Based Indexing and Selection Complete Guide

What is loc?

loc is a pandas accessor for label-based indexing and selection. It’s one of the most powerful tools for working with DataFrames because it allows you to access data using labels (row and column names) instead of numeric positions.

Why use loc instead of direct indexing?

  • Works with any index type (integers, strings, dates, etc.)
  • Supports boolean indexing for conditional selection
  • Allows range slicing by labels (inclusive on both ends)
  • More readable and maintainable code
  • Essential for complex filtering operations

Key characteristics:

  • Label-based: Uses row/column names, not positions
  • Inclusive: Both start and end are included in slices
  • Flexible: Works with scalars, lists, slices, and boolean arrays
  • Fast: Optimized for large datasets

Syntax and Basic Usage

Basic Syntax

import pandas as pd# Create a sample DataFrame
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, 30, 35, 40],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’],
‘Salary’: [50000, 75000, 60000, 55000]
}, index=[‘E001’, ‘E002’, ‘E003’, ‘E004’])print(df)# Basic loc syntax:
# df.loc[row_indexer, column_indexer]

Output:

 Name Age Department Salary
E001 Alice 25 Sales 50000
E002 Bob 30 IT 75000
E003 Charlie 35 HR 60000
E004 David 40 Sales 55000

Selecting Single Rows

Select One Row by Label

# Select row with index ‘E002’
row = df.loc[‘E002’]
print(row)
print(type(row)) # pandas.Series

Output:

Name Bob
Age 30
Department IT
Salary 75000
Name: E002, dtype: object

Select Specific Column from Row

# Select specific cell
name = df.loc[‘E002’, ‘Name’]
print(name) # Output: Bobsalary = df.loc[‘E001’, ‘Salary’]
print(salary) # Output: 50000
💡 Tip: When selecting a single row, you get a Series. When selecting a single cell, you get a scalar value.

Selecting Multiple Rows

Select Multiple Rows by List

# Select specific rows using a list of labels
selected = df.loc[[‘E001’, ‘E003’]]
print(selected)

Output:

 Name Age Department Salary
E001 Alice 25 Sales 50000
E003 Charlie 35 HR 60000

Select Range of Rows (Slice)

Important: loc slices are inclusive on both ends, unlike Python slicing!

# Select rows from E001 to E003 (inclusive!)
selected = df.loc[‘E001′:’E003’]
print(selected)

Output:

 Name Age Department Salary
E001 Alice 25 Sales 50000
E002 Bob 30 IT 75000
E003 Charlie 35 HR 60000
⚠️ Important: In loc slicing, both start and end are included. This is different from Python’s standard slicing!

Selecting Columns

Select Single Column

# Select one column
names = df.loc[:, ‘Name’]
print(names)
print(type(names)) # pandas.Series

Output:

E001 Alice
E002 Bob
E003 Charlie
E004 David
Name: Name, dtype: object

Select Multiple Columns

# Select multiple columns using list
cols = df.loc[:, [‘Name’, ‘Department’]]
print(cols)

Output:

 Name Department
E001 Alice Sales
E002 Bob IT
E003 Charlie HR
E004 David Sales

Select Range of Columns

# Select columns from ‘Age’ to ‘Salary’ (inclusive!)
cols = df.loc[:, ‘Age’:’Salary’]
print(cols)

Output:

 Age Department Salary
E001 25 Sales 50000
E002 30 IT 75000
E003 35 HR 60000
E004 40 Sales 55000
💡 Note: The colon (:) means “all rows”. So df.loc[:, 'Name'] means “all rows, ‘Name’ column”.

Conditional Filtering with loc

One of loc’s most powerful features is conditional filtering using boolean masks:

Filter by Single Condition

# Select rows where Age > 30
filtered = df.loc[df[‘Age’] > 30]
print(filtered)

Output:

 Name Age Department Salary
E003 Charlie 35 HR 60000
E004 David 40 Sales 55000

Filter by Multiple Conditions

# Multiple conditions: Age > 30 AND Department == ‘Sales’
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Department’] == ‘Sales’)]
print(filtered)

Output:

 Name Age Department Salary
E004 David 40 Sales 55000

Filter with OR Condition

# OR condition: Department is ‘Sales’ OR ‘HR’
filtered = df.loc[(df[‘Department’] == ‘Sales’) | (df[‘Department’] == ‘HR’)]
print(filtered)
⚠️ Important: Use & for AND and | for OR. Don’t use and or or keywords!

Filter with isin()

# Check if Department is in a list
filtered = df.loc[df[‘Department’].isin([‘Sales’, ‘HR’])]
print(filtered)
💡 Best practice: Use isin() for checking membership in multiple values.

loc vs iloc: Key Differences

Understanding the difference between loc and iloc is crucial:

Feature loc iloc
Type Label-based Position-based
Indexing Uses row/column names/labels Uses integer positions (0, 1, 2…)
Slice endpoint Inclusive (includes end) Exclusive (excludes end)
Boolean indexing ✓ Supported ✗ Not directly supported
Example df.loc['E002', 'Name'] df.iloc[1, 0]

Practical Comparison

# Using loc (label-based)
print(df.loc[‘E001′:’E003’]) # Includes E003# Using iloc (position-based)
print(df.iloc[0:3]) # Includes position 0, 1, 2 (same result, but method differs)# Different results with strings
print(df.loc[‘E001′:’E002’]) # Includes both E001 and E002
print(df.iloc[0:2]) # Includes positions 0 and 1 (same result again)

✅ When to use each:

  • Use loc: When you know row/column labels (most common)
  • Use iloc: When you need position-based access (e.g., first 5 rows)
  • Use loc: For conditional filtering (Boolean indexing)
  • Use iloc: When working with numeric row positions

Assignment with loc

Update Single Cell

# Change Alice’s salary to 55000
df.loc[‘E001’, ‘Salary’] = 55000
print(df.loc[‘E001’])

Update Multiple Cells in Column

# Increase all IT salaries by 5000
df.loc[df[‘Department’] == ‘IT’, ‘Salary’] += 5000
print(df)

Update Multiple Columns

# Update specific columns for specific rows
df.loc[‘E003’, [‘Age’, ‘Salary’]] = [36, 65000]
print(df.loc[‘E003’])
💡 Tip: loc is the preferred way to update DataFrames. It’s cleaner and safer than direct assignment.

Advanced Techniques

Complex Boolean Filtering

# Find employees in Sales with salary > 52000
condition = (df[‘Department’] == ‘Sales’) & (df[‘Salary’] > 52000)
filtered = df.loc[condition]
print(filtered)

Using isnull() and notnull()

# Find rows with missing values in ‘Salary’ column
missing = df.loc[df[‘Salary’].isnull()]# Find rows without missing values
present = df.loc[df[‘Salary’].notnull()]

String Matching with str accessor

# Find names containing ‘a’ (case-insensitive)
filtered = df.loc[df[‘Name’].str.contains(‘a’, case=False)]
print(filtered)

Combining Multiple Conditions

# Complex: (Age > 28) AND (Salary < 70000) AND (Department in [‘Sales’, ‘HR’]) condition = (df[‘Age’] > 28) & \
(df[‘Salary’] < 70000) & \
(df[‘Department’].isin([‘Sales’, ‘HR’]))
filtered = df.loc[condition]
print(filtered)

Performance Considerations

🚀 Optimize Performance

1. Use loc for large DataFrames

# ✅ FAST – loc is optimized for large datasets
filtered = df.loc[df[‘Age’] > 30]# ❌ SLOWER – iterating through rows
for idx, row in df.iterrows():
if row[‘Age’] > 30:
print(row)

2. Combine multiple conditions efficiently

# ✅ FAST – single filter operation
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]# ❌ SLOWER – multiple separate filters
filtered = df.loc[df[‘Age’] > 30]
filtered = filtered.loc[filtered[‘Salary’] > 50000]

3. Use loc instead of at for loops

# ✅ FAST – vectorized operation
df.loc[df[‘Department’] == ‘IT’, ‘Salary’] *= 1.1# ❌ SLOWER – iterating with at accessor
for idx in df.index:
if df.at[idx, ‘Department’] == ‘IT’:
df.at[idx, ‘Salary’] *= 1.1

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting Parentheses in Boolean Conditions

# ❌ WRONG – syntax error
filtered = df.loc[df[‘Age’] > 30 & df[‘Salary’] > 50000]# ✅ CORRECT – parentheses are required
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]

⚠️ Mistake #2: Using ‘and’ or ‘or’ instead of ‘&’ or ‘|’

# ❌ WRONG – doesn’t work with pandas arrays
filtered = df.loc[df[‘Age’] > 30 and df[‘Salary’] > 50000]# ✅ CORRECT – use & and |
filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]

⚠️ Mistake #3: Forgetting the Colon for “All Rows”

# ❌ WRONG – selects just the ‘Name’ value, not column
value = df.loc[‘E001’, ‘Name’]# ✅ CORRECT – for all rows
column = df.loc[:, ‘Name’]

⚠️ Mistake #4: Not Handling KeyError for Missing Labels

# ❌ WRONG – raises KeyError if ‘E999’ doesn’t exist
row = df.loc[‘E999’]# ✅ CORRECT – use get or check first
row = df.loc[df.index.isin([‘E999’])] # Returns empty DataFrame if not found

Key Takeaways

You now understand how to use loc for powerful data selection:

  • Label-based indexing: Access data using row/column names
  • Inclusive slicing: Both start and end are included
  • Boolean filtering: Conditional selection with multiple conditions
  • Assignment: Safely update DataFrames using loc
  • Performance: Vectorized loc operations are faster than loops
  • loc vs iloc: Use loc for labels, iloc for positions

Next step: Practice with your own DataFrames and experiment with different filtering conditions to master this essential pandas skill!

📚 Learn more pandas tutorials at Pandas How-To – Your complete guide to data analysis in Python

Related articles: iloc vs loc, Boolean Indexing, Pandas Selection, Advanced Filtering

Leave a Reply