How to Read and Write Parquet Files in Pandas

Parquet is a columnar storage format. It is efficient for large datasets. Pandas can read and write Parquet files. This makes it a good option for data storage.

Reading Parquet Files

You can read Parquet files using pd.read_parquet. This function reads the data into a DataFrame.

Example (Reading a Parquet File)

import pandas as pd

df = pd.read_parquet("my_data.parquet") # Replace with your file path

Writing Parquet Files

You can write DataFrames to Parquet files using df.to_parquet. This saves the DataFrame in Parquet format.

Example (Writing a DataFrame to Parquet)

import pandas as pd

data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
df.to_parquet("my_data.parquet")

Benefits of Parquet

Parquet stores data column by column. This makes queries faster. It also compresses data well. This saves storage space. It is often much faster and more efficient than CSV. Especially for large datasets.

For more advanced options, see the Pandas documentation. It contains more information on Parquet I/O.

Leave a Reply