Pandas How To Uncategorized How to find duplicate rows in Pandas dataframe

How to find duplicate rows in Pandas dataframe

In this post you will learn how to find duplicate rows in Pandas dataframe.

How to find duplicate rows

First, I’ll check if there are any duplicate rows in my dataframe. For this purpose, I will add a column in which to display information whether the row is a duplicate. I’ll use the duplicated function to check if the rows are duplicates.

import pandas as pd

my_df = pd.DataFrame({'id':['id1','id2','id3', 'id2'],
                          'Column1': ['2', '7', '6', '7'],
                           'Column2': ['2', '5', '8', '5'],
                           'Column3': ['4', '1', '9', '1']})

dup_df = my_df.copy()

dup_df['Dup_Column'] = my_df.duplicated()

print(f'Looking for duplicates in my dataframe: \n{dup_df}')

how to find duplicate rows

How to list duplicate rows

You will also learn how to easily display only duplicate rows in Pandas.

In order to display the redundant rows, I will create a copy of the dataframe. Will use duplicated function.

import pandas as pd

my_df = pd.DataFrame({'id':['id1','id2','id3', 'id2'],
                          'Column1': ['2', '7', '6', '7'],
                           'Column2': ['2', '5', '8', '5'],
                           'Column3': ['4', '1', '9', '1']})

dup_list = my_df[my_df.duplicated()]

print(f'Duplicate rows: \n{dup_list}')

how to list duplicate rows

Pandas displays only one row which is duplicated. This is the same line we found in the first example.

See also:
Duplicated function documentation.
How to remove rows with certain values
How to count number of duplicates
How to drop duplicates
How to remove duplicate rows based on multiple columns

1 thought on “How to find duplicate rows in Pandas dataframe”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post