Pandas How To Uncategorized How to remove duplicate rows based on multiple columns

How to remove duplicate rows based on multiple columns

You can remove duplicate rows based on multiple columns in Pandas by using the drop_duplicates method.

Here is an example of how you could remove duplicate rows based on multiple columns in a Pandas DataFrame:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 2, 3],
'B': [10, 20, 30, 40, 20, 30],
'C': [100, 200, 300, 400, 200, 300]})

# Remove duplicate rows based on columns 'A' and 'B'
df = df.drop_duplicates(subset=['A', 'B'])

# Print the result
print(df)

In this example, the drop_duplicates method is used to remove duplicate rows based on the values in the columns ‘A’ and ‘B’. The result is stored in a new DataFrame.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post