Pandas Performance Optimization: Speed Up Your Code

Want faster Pandas code? Check below strategies to optimize performance, memory usage, and runtime when working with large or complex DataFrames in Python.

1. Use Vectorized Operations

Avoid slow Python loops:

# Bad
for i in df.index:
    df.loc[i, 'new_col'] = df.loc[i, 'old_col'] * 2

Use vectorization instead:

# Good
df['new_col'] = df['old_col'] * 2

2. Optimize Data Types

Reduce memory usage and speed up processing by converting columns:

df['int_col'] = df['int_col'].astype('Int32')
df['cat_col'] = df['cat_col'].astype('category')

3. Use Efficient Indexes

Set indexes before merging or joining:

df1 = df1.set_index('id')
df2 = df2.set_index('id')
merged = df1.join(df2, how='inner')

4. Try pandas.eval() for Arithmetic

Evaluate expressions faster using eval():

df['sum'] = pd.eval('df.a + df.b')

5. Use Numba or Cython for Heavy Loops

If you must loop, try compiling with Numba:

from numba import njit

@njit
def fast_sum(a, b):
    return a + b

6. Parallelize with multiprocessing

from multiprocessing import Pool
import numpy as np

def process_chunk(chunk):
    return chunk.assign(sum=chunk['a'] + chunk['b'])

chunks = np.array_split(df, 4)
with Pool(4) as pool:
    df = pd.concat(pool.map(process_chunk, chunks))

7. Monitor Memory

Use df.info() and df.memory_usage(deep=True) to inspect usage. Limit columns on import:

df = pd.read_csv('data.csv', usecols=['id','a','b'], dtype={'id': 'int32'})

8. Benchmark with %timeit

Use Jupyter’s %timeit to compare speeds:

%timeit df['a'] + df['b']
%timeit pd.eval('df.a + df.b')

Quick Checklist

  • Avoid loops – prefer vectorized operations
  • Convert columns to efficient dtypes
  • Use indexes for joins
  • Try pandas.eval() and Numba for speed
  • Use multiprocessing for large DataFrames
  • Profile with %timeit and monitor memory

Example Workflow

import pandas as pd

df = pd.read_csv('data.csv', usecols=['a', 'b'], dtype={'a': 'float32', 'b': 'float32'})
df['sum'] = pd.eval('df.a + df.b')

Leave a Reply