In Pandas, you can compare two columns for duplicates using the duplicated method with the subset parameter. Here is an example:
import pandas as pd # create a sample dataframe df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [3, 4, 5, 6, 7]}) # compare 'col1' and 'col2' for duplicates df['duplicates'] = df.duplicated(subset=['col1', 'col2'], keep=False) # print the result print(df)
The output will be:
col1 col2 duplicates 0 1 3 False 1 2 4 False 2 3 5 True 3 4 6 False 4 5 7 True
In this example, the duplicated method returns a boolean indicating whether each row is a duplicate of a previous row. The subset parameter is used to specify which columns to compare, and keep=False ensures that all duplicates (not just the first occurrence) are flagged.