Pandas How To Uncategorized How to drop duplicates based on two columns

How to drop duplicates based on two columns

In Pandas, you can drop duplicates based on two columns using the drop_duplicates method with the subset parameter. Here is an example:

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 3, 5], 'col2': [3, 4, 5, 6, 7, 5, 7]})

# drop duplicates based on 'col1' and 'col2'
df.drop_duplicates(subset=['col1', 'col2'], inplace=True)

# print the result
print(df)

The output will be:

col1 col2
0 1 3
1 2 4
2 3 5
3 4 6
4 5 7

In this example, the drop_duplicates method removes rows where all values in the specified columns (‘col1’ and ‘col2’) are duplicates of values in the previous rows. The subset parameter is used to specify which columns to compare, and inplace=True modifies the original dataframe.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post