If you filter DataFrames with chained comparisons or boolean masks, you’ll love df.query(). It offers clean syntax, improved performance on complex filters, and supports inline variables.
Why Use query()?
- Cleaner syntax: Write expressions like SQL where clauses
- Fewer errors: No need for bracket-heavy filters
- Performance boosts: Evaluated with
numexprby default - Variable support: Easily insert Python variables using “@”
Basic Syntax Example
import pandas as pd
df = pd.DataFrame({
'age': [25, 32, 47, 51],
'city': ['NY', 'LA', 'NY', 'SF'],
'score': [88, 92, 75, 85]
})
# Filter age > 30 and score >= 80
df.query('age > 30 & score >= 80')
Result: Rows where age is over 30 and score at least 80.
Using Variables in Queries
min_score = 80
target_cities = ['NY', 'SF']
df.query('score >= @min_score and city in @target_cities')
Speed Advantage Over Boolean Masks
Under the hood, query() uses numexpr, making it faster than:
df[(df['age'] > 30) & (df['score'] >= 80)]
Complex Logic Example
df.query('(city == "NY" or city == "SF") and score != 85')
When query() May Not Work
- Column names with spaces or special chars (non-identifier)
- Reserved Python keywords — use backticks, e.g.
`class` - MultiIndex columns — need different handling
Comparison Table: query() vs loc / boolean indexing
| Feature | query() |
df.loc / boolean filters |
|---|---|---|
| Readability | High (SQL-style) | Medium–Low (verbose) |
| Performance | Better with numexpr | Standard Python evaluation |
| Variable injection | Yes (@var) | No direct support |
| Complex logic | Yes | Yes |
| Edge cases | Columns must be valid identifiers | Works for any column name |
When to Use query()
- SQL-style filtering improves readability
- Using variables inside filters
- Working with large DataFrames and needing speed
query() offers a neat, fast, SQL-like way to filter DataFrames, with support for variables and improved speed. Use it for cleaner code and more efficient filtering in Pandas!
