Pandas How To Uncategorized How to remove duplicate rows in Pandas

How to remove duplicate rows in Pandas

To remove duplicate rows in Pandas, you can use the drop_duplicates() function. This function takes several parameters to specify how to identify and drop duplicate rows.

Here’s an example code snippet:

import pandas as pd

# Create a sample dataframe with duplicate rows
data = {'Name': ['John', 'Jane', 'John', 'Jack', 'Jane'],
'Age': [25, 30, 25, 40, 30],
'Gender': ['M', 'F', 'M', 'M', 'F']}
df = pd.DataFrame(data)

# Print the original dataframe
print("Original Dataframe:\n", df)

# Remove duplicate rows based on all columns
df = df.drop_duplicates()

# Print the cleaned dataframe
print("\nCleaned Dataframe:\n", df)

This will output:

Original Dataframe:
Name Age Gender
0 John 25 M
1 Jane 30 F
2 John 25 M
3 Jack 40 M
4 Jane 30 F

Cleaned Dataframe:
Name Age Gender
0 John 25 M
1 Jane 30 F
3 Jack 40 M

In this example, we created a sample dataframe with duplicate rows and printed the original dataframe. Then we used the drop_duplicates() function to remove the duplicate rows based on all columns, and printed the cleaned dataframe. As you can see, the cleaned dataframe has only unique rows.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post