In Pandas, you can drop duplicates based on two columns using the drop_duplicates method with the subset parameter. Here is an example:
import pandas as pd # create a sample dataframe df = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 3, 5], 'col2': [3, 4, 5, 6, 7, 5, 7]}) # drop duplicates based on 'col1' and 'col2' df.drop_duplicates(subset=['col1', 'col2'], inplace=True) # print the result print(df)
The output will be:
col1 col2 0 1 3 1 2 4 2 3 5 3 4 6 4 5 7
In this example, the drop_duplicates method removes rows where all values in the specified columns (‘col1’ and ‘col2’) are duplicates of values in the previous rows. The subset parameter is used to specify which columns to compare, and inplace=True modifies the original dataframe.