Pandas explode(): Flatten Nested Lists in DataFrames

If your DataFrame includes columns containing lists or arrays, explode() is your go-to method to normalize this data into separate rows—ideal for analysis, filtering, and merging.

What Does explode() Do?

explode() transforms a nested list column into multiple rows—one per list element—while duplicating values in other columns accordingly.

Example:

import pandas as pd

df = pd.DataFrame({
  'id': [1, 2, 3],
  'tags': [['python', 'pandas'], [], ['data', 'analysis', 'ml']]
})

df_exploded = df.explode('tags')

Result:

   id     tags
0   1   python
0   1   pandas
1   2      NaN
2   3     data
2   3 analysis
2   3       ml

Use this when nested list-like data (e.g. tags, labels, keywords) must be flattened for further processing.

Handling Edge Cases

  • Empty lists produce rows with NaN; drop missing with dropna().
  • Reset index: Flattened data retains original row index:
    df_exploded.reset_index(drop=True, inplace=True)
  • Multiple list columns require consecutive calls:
    df.explode('tags').explode('values')

Alternatives to explode()

  • apply(pd.Series): Breaks lists into separate columns, not rows.
  • Loop + concat: Manual expansion—slower and more verbose.
  • json_normalize: Works for dicts/nested JSON, not simple lists.

Practical Use Cases

Use Case 1: You have user-generated tags per row—explode tags into multiple rows to count tag frequency.

Use Case 2: Nested ingredients lists in recipes—flatten lists to analyze ingredient usage.

Use Case 3: Survey responses with multiple choices—explode selections for pivoting or merging.

Example Workflow

# Flatten tags and drop missing
df_exploded = df.explode('tags').dropna(subset=['tags']).reset_index(drop=True)

# Count tag frequency:
tag_counts = df_exploded['tags'].value_counts()

Performance Tips

  • Use explode() with eager evaluation;
  • Combine with `dropna()`, `query()`, and `pivot_table()` for powerful transformations;
  • Avoid repeating explode on large DataFrames—chain operations when possible.

explode() is an essential Pandas method for normalizing list-like data structures — transforming nested lists into row structures that are easy to analyze, filter, and summarize. Once you master it, you’ll tackle many real-world data-wrangling tasks with ease.

Leave a Reply