To remove duplicate rows in Pandas, you can use the drop_duplicates()
function. This function takes several parameters to specify how to identify and drop duplicate rows.
Here’s an example code snippet:
import pandas as pd # Create a sample dataframe with duplicate rows data = {'Name': ['John', 'Jane', 'John', 'Jack', 'Jane'], 'Age': [25, 30, 25, 40, 30], 'Gender': ['M', 'F', 'M', 'M', 'F']} df = pd.DataFrame(data) # Print the original dataframe print("Original Dataframe:\n", df) # Remove duplicate rows based on all columns df = df.drop_duplicates() # Print the cleaned dataframe print("\nCleaned Dataframe:\n", df)
This will output:
Original Dataframe: Name Age Gender 0 John 25 M 1 Jane 30 F 2 John 25 M 3 Jack 40 M 4 Jane 30 F Cleaned Dataframe: Name Age Gender 0 John 25 M 1 Jane 30 F 3 Jack 40 M
In this example, we created a sample dataframe with duplicate rows and printed the original dataframe. Then we used the drop_duplicates() function to remove the duplicate rows based on all columns, and printed the cleaned dataframe. As you can see, the cleaned dataframe has only unique rows.