If you filter DataFrames with chained comparisons or boolean masks, you’ll love df.query()
. It offers clean syntax, improved performance on complex filters, and supports inline variables.
Why Use query()
?
- Cleaner syntax: Write expressions like SQL where clauses
- Fewer errors: No need for bracket-heavy filters
- Performance boosts: Evaluated with
numexpr
by default - Variable support: Easily insert Python variables using “@”
Basic Syntax Example
import pandas as pd
df = pd.DataFrame({
'age': [25, 32, 47, 51],
'city': ['NY', 'LA', 'NY', 'SF'],
'score': [88, 92, 75, 85]
})
# Filter age > 30 and score >= 80
df.query('age > 30 & score >= 80')
Result: Rows where age
is over 30 and score
at least 80.
Using Variables in Queries
min_score = 80
target_cities = ['NY', 'SF']
df.query('score >= @min_score and city in @target_cities')
Speed Advantage Over Boolean Masks
Under the hood, query()
uses numexpr
, making it faster than:
df[(df['age'] > 30) & (df['score'] >= 80)]
Complex Logic Example
df.query('(city == "NY" or city == "SF") and score != 85')
When query()
May Not Work
- Column names with spaces or special chars (non-identifier)
- Reserved Python keywords — use backticks, e.g.
`class`
- MultiIndex columns — need different handling
Comparison Table: query() vs loc / boolean indexing
Feature | query() |
df.loc / boolean filters |
---|---|---|
Readability | High (SQL-style) | Medium–Low (verbose) |
Performance | Better with numexpr | Standard Python evaluation |
Variable injection | Yes (@var) | No direct support |
Complex logic | Yes | Yes |
Edge cases | Columns must be valid identifiers | Works for any column name |
When to Use query()
- SQL-style filtering improves readability
- Using variables inside filters
- Working with large DataFrames and needing speed
query()
offers a neat, fast, SQL-like way to filter DataFrames, with support for variables and improved speed. Use it for cleaner code and more efficient filtering in Pandas!