Pandas How To Uncategorized How to compare two columns for duplicates

How to compare two columns for duplicates

In Pandas, you can compare two columns for duplicates using the duplicated method with the subset parameter. Here is an example:

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [3, 4, 5, 6, 7]})

# compare 'col1' and 'col2' for duplicates
df['duplicates'] = df.duplicated(subset=['col1', 'col2'], keep=False)

# print the result
print(df)

The output will be:

col1 col2 duplicates
0 1 3 False
1 2 4 False
2 3 5 True
3 4 6 False
4 5 7 True

In this example, the duplicated method returns a boolean indicating whether each row is a duplicate of a previous row. The subset parameter is used to specify which columns to compare, and keep=False ensures that all duplicates (not just the first occurrence) are flagged.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post