Pandas loc: Label-Based Indexing and Selection Complete Guide

What is loc?

loc is a pandas accessor for label-based indexing and selection. It’s one of the most powerful tools for working with DataFrames because it allows you to access data using labels (row and column names) instead of numeric positions.

Why use loc instead of direct indexing?

Works with any index type (integers, strings, dates, etc.)
Supports boolean indexing for conditional selection
Allows range slicing by labels (inclusive on both ends)
More readable and maintainable code
Essential for complex filtering operations

Key characteristics:

Label-based: Uses row/column names, not positions
Inclusive: Both start and end are included in slices
Flexible: Works with scalars, lists, slices, and boolean arrays
Fast: Optimized for large datasets

Syntax and Basic Usage

Basic Syntax

import pandas as pd# Create a sample DataFrame
df = pd.DataFrame({
‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, 30, 35, 40],
‘Department’: [‘Sales’, ‘IT’, ‘HR’, ‘Sales’],
‘Salary’: [50000, 75000, 60000, 55000]
}, index=[‘E001’, ‘E002’, ‘E003’, ‘E004’])print(df)# Basic loc syntax:
# df.loc[row_indexer, column_indexer]

Output:

 Name Age Department Salary

E001 Alice 25 Sales 50000

E002 Bob 30 IT 75000

E003 Charlie 35 HR 60000

E004 David 40 Sales 55000

Selecting Single Rows

Select One Row by Label

# Select row with index ‘E002’

row = df.loc[‘E002’]

print(row)

print(type(row)) # pandas.Series

Output:

Name Bob

Age 30

Department IT

Salary 75000

Name: E002, dtype: object

Select Specific Column from Row

# Select specific cell

name = df.loc[‘E002’, ‘Name’]

print(name) # Output: Bobsalary = df.loc[‘E001’, ‘Salary’]

print(salary) # Output: 50000

💡 Tip: When selecting a single row, you get a Series. When selecting a single cell, you get a scalar value.

Selecting Multiple Rows

Select Multiple Rows by List

# Select specific rows using a list of labels

selected = df.loc[[‘E001’, ‘E003’]]

print(selected)

Output:

 Name Age Department Salary

E001 Alice 25 Sales 50000

E003 Charlie 35 HR 60000

Select Range of Rows (Slice)

Important: loc slices are inclusive on both ends, unlike Python slicing!

# Select rows from E001 to E003 (inclusive!)

selected = df.loc[‘E001′:’E003’]

print(selected)

Output:

 Name Age Department Salary

E001 Alice 25 Sales 50000

E002 Bob 30 IT 75000

E003 Charlie 35 HR 60000

⚠️ Important: In loc slicing, both start and end are included. This is different from Python’s standard slicing!

Selecting Columns

Select Single Column

# Select one column

names = df.loc[:, ‘Name’]

print(names)

print(type(names)) # pandas.Series

Output:

E001 Alice

E002 Bob

E003 Charlie

E004 David

Name: Name, dtype: object

Select Multiple Columns

# Select multiple columns using list

cols = df.loc[:, [‘Name’, ‘Department’]]

print(cols)

Output:

 Name Department

E001 Alice Sales

E002 Bob IT

E003 Charlie HR

E004 David Sales

Select Range of Columns

# Select columns from ‘Age’ to ‘Salary’ (inclusive!)

cols = df.loc[:, ‘Age’:’Salary’]

print(cols)

Output:

 Age Department Salary

E001 25 Sales 50000

E002 30 IT 75000

E003 35 HR 60000

E004 40 Sales 55000

💡 Note: The colon (:) means “all rows”. So df.loc[:, 'Name'] means “all rows, ‘Name’ column”.

Conditional Filtering with loc

One of loc’s most powerful features is conditional filtering using boolean masks:

Filter by Single Condition

# Select rows where Age > 30
filtered = df.loc[df[‘Age’] > 30]
print(filtered)

Output:

 Name Age Department Salary

E003 Charlie 35 HR 60000

E004 David 40 Sales 55000

Filter by Multiple Conditions

# Multiple conditions: Age > 30 AND Department == ‘Sales’

filtered = df.loc[(df[‘Age’] > 30) & (df[‘Department’] == ‘Sales’)]

print(filtered)

Output:

 Name Age Department Salary

E004 David 40 Sales 55000

Filter with OR Condition

# OR condition: Department is ‘Sales’ OR ‘HR’

filtered = df.loc[(df[‘Department’] == ‘Sales’) | (df[‘Department’] == ‘HR’)]

print(filtered)

⚠️ Important: Use & for AND and | for OR. Don’t use and or or keywords!

Filter with isin()

# Check if Department is in a list

filtered = df.loc[df[‘Department’].isin([‘Sales’, ‘HR’])]

print(filtered)

💡 Best practice: Use isin() for checking membership in multiple values.

loc vs iloc: Key Differences

Understanding the difference between loc and iloc is crucial:

Feature	loc	iloc
Type	Label-based	Position-based
Indexing	Uses row/column names/labels	Uses integer positions (0, 1, 2…)
Slice endpoint	Inclusive (includes end)	Exclusive (excludes end)
Boolean indexing	✓ Supported	✗ Not directly supported
Example	`df.loc['E002', 'Name']`	`df.iloc[1, 0]`

Practical Comparison

# Using loc (label-based)

print(df.loc[‘E001′:’E003’]) # Includes E003# Using iloc (position-based)

print(df.iloc[0:3]) # Includes position 0, 1, 2 (same result, but method differs)# Different results with strings

print(df.loc[‘E001′:’E002’]) # Includes both E001 and E002

print(df.iloc[0:2]) # Includes positions 0 and 1 (same result again)

✅ When to use each:

Use loc: When you know row/column labels (most common)
Use iloc: When you need position-based access (e.g., first 5 rows)
Use loc: For conditional filtering (Boolean indexing)
Use iloc: When working with numeric row positions

Assignment with loc

Update Single Cell

# Change Alice’s salary to 55000

df.loc[‘E001’, ‘Salary’] = 55000

print(df.loc[‘E001’])

Update Multiple Cells in Column

# Increase all IT salaries by 5000

df.loc[df[‘Department’] == ‘IT’, ‘Salary’] += 5000

print(df)

Update Multiple Columns

# Update specific columns for specific rows

df.loc[‘E003’, [‘Age’, ‘Salary’]] = [36, 65000]

print(df.loc[‘E003’])

💡 Tip: loc is the preferred way to update DataFrames. It’s cleaner and safer than direct assignment.

Advanced Techniques

Complex Boolean Filtering

# Find employees in Sales with salary > 52000

condition = (df[‘Department’] == ‘Sales’) & (df[‘Salary’] > 52000)

filtered = df.loc[condition]

print(filtered)

Using isnull() and notnull()

# Find rows with missing values in ‘Salary’ column

missing = df.loc[df[‘Salary’].isnull()]# Find rows without missing values

present = df.loc[df[‘Salary’].notnull()]

String Matching with str accessor

# Find names containing ‘a’ (case-insensitive)

filtered = df.loc[df[‘Name’].str.contains(‘a’, case=False)]

print(filtered)

Combining Multiple Conditions

# Complex: (Age > 28) AND (Salary < 70000) AND (Department in [‘Sales’, ‘HR’]) condition = (df[‘Age’] > 28) & \

(df[‘Salary’] < 70000) & \

(df[‘Department’].isin([‘Sales’, ‘HR’]))

filtered = df.loc[condition]

print(filtered)

Performance Considerations

🚀 Optimize Performance

1. Use loc for large DataFrames

# ✅ FAST – loc is optimized for large datasets

filtered = df.loc[df[‘Age’] > 30]# ❌ SLOWER – iterating through rows

for idx, row in df.iterrows():

if row[‘Age’] > 30:

print(row)

2. Combine multiple conditions efficiently

# ✅ FAST – single filter operation

filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]# ❌ SLOWER – multiple separate filters

filtered = df.loc[df[‘Age’] > 30]

filtered = filtered.loc[filtered[‘Salary’] > 50000]

3. Use loc instead of at for loops

# ✅ FAST – vectorized operation

df.loc[df[‘Department’] == ‘IT’, ‘Salary’] *= 1.1# ❌ SLOWER – iterating with at accessor

for idx in df.index:

if df.at[idx, ‘Department’] == ‘IT’:

df.at[idx, ‘Salary’] *= 1.1

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting Parentheses in Boolean Conditions

# ❌ WRONG – syntax error

filtered = df.loc[df[‘Age’] > 30 & df[‘Salary’] > 50000]# ✅ CORRECT – parentheses are required

filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]

⚠️ Mistake #2: Using ‘and’ or ‘or’ instead of ‘&’ or ‘|’

# ❌ WRONG – doesn’t work with pandas arrays

filtered = df.loc[df[‘Age’] > 30 and df[‘Salary’] > 50000]# ✅ CORRECT – use & and |

filtered = df.loc[(df[‘Age’] > 30) & (df[‘Salary’] > 50000)]

⚠️ Mistake #3: Forgetting the Colon for “All Rows”

# ❌ WRONG – selects just the ‘Name’ value, not column

value = df.loc[‘E001’, ‘Name’]# ✅ CORRECT – for all rows

column = df.loc[:, ‘Name’]

⚠️ Mistake #4: Not Handling KeyError for Missing Labels

# ❌ WRONG – raises KeyError if ‘E999’ doesn’t exist

row = df.loc[‘E999’]# ✅ CORRECT – use get or check first

row = df.loc[df.index.isin([‘E999’])] # Returns empty DataFrame if not found

Key Takeaways

You now understand how to use loc for powerful data selection:

Label-based indexing: Access data using row/column names
Inclusive slicing: Both start and end are included
Boolean filtering: Conditional selection with multiple conditions
Assignment: Safely update DataFrames using loc
Performance: Vectorized loc operations are faster than loops
loc vs iloc: Use loc for labels, iloc for positions

Next step: Practice with your own DataFrames and experiment with different filtering conditions to master this essential pandas skill!

What is loc?

Syntax and Basic Usage

Basic Syntax

Selecting Single Rows

Select One Row by Label

Select Specific Column from Row

Selecting Multiple Rows

Select Multiple Rows by List

Select Range of Rows (Slice)

Selecting Columns

Select Single Column

Select Multiple Columns

Select Range of Columns

Conditional Filtering with loc

Filter by Single Condition

Filter by Multiple Conditions

Filter with OR Condition

Filter with isin()

loc vs iloc: Key Differences

Practical Comparison

✅ When to use each:

Assignment with loc

Update Single Cell

Update Multiple Cells in Column

Update Multiple Columns

Advanced Techniques

Complex Boolean Filtering

Using isnull() and notnull()

String Matching with str accessor

Combining Multiple Conditions

Performance Considerations

🚀 Optimize Performance

Common Mistakes to Avoid

⚠️ Mistake #1: Forgetting Parentheses in Boolean Conditions

⚠️ Mistake #2: Using ‘and’ or ‘or’ instead of ‘&’ or ‘|’

⚠️ Mistake #3: Forgetting the Colon for “All Rows”

⚠️ Mistake #4: Not Handling KeyError for Missing Labels

Key Takeaways

Related posts:

You Might Also Like

How to replace part of string in Pandas

Boolean Indexing in Pandas

Calculating Correlations with Pandas

Leave a Reply Cancel reply