Squashing bugs and speeding up your pandas code is like fine-tuning a race car: both satisfying and crucial for performance. Let’s get under the hood.
Spotting the Slowpokes with Profiling
First step in tuning? Find out where the bottlenecks are. Pandas has no built-in profiler, but Python’s got your back with cProfile. It’s not specific to pandas, but it does the trick:
import cProfile import pandas as pd def my_slow_function(): df = pd.DataFrame({'A': range(10000), 'B': range(10000)}) for _ in range(100): df = df.append({'A': 1, 'B': 2}, ignore_index=True) cProfile.run('my_slow_function()')
This snippet gives you a rundown of what’s eating up your time.
Leaner DataFrames with astype
Changing data types can drastically reduce memory usage and speed up operations. Be smart about your types:
# Convert types to reduce DataFrame size df['A'] = df['A'].astype('int32') df['B'] = df['B'].astype('category')
Avoiding the Loop Trap
Loops and pandas often don’t mix well. Vectorized operations and applying functions across DataFrames are your friends for avoiding the dreaded loop slowdown:
# Vectorized operation example df['C'] = df['A'] + df['B']
When to Use apply
apply can be a savior but also a sinner in terms of performance. Use it wisely, especially with custom functions:
# Use apply() for complex operations df['D'] = df['A'].apply(lambda x: x * 2 if x > 5 else x + 2)