Pandas How To Uncategorized How to drop duplicates

How to drop duplicates

As a data scientist, I tell you that the first thing to do when working with data is to clear your dataset. You have to make the data unique when necessary, so you need to learn how to remove duplicates in Pandas.

Pandas offers a dedicated drop_duplicates  function which you use to drop duplicates from the dataframe.

Pandas drop_duplicates


import pandas as pd
df = pd.DataFrame({"A": [1, 2, 2], "B": [4, 5, 5]})
df = df.drop_duplicates(keep=False, inplace=False)

Remember to use two parameters:

  • keep to don’t display duplicates anymore
  • inplace to actually save the change

By default, the first occurrence of each duplicated row is kept and subsequent duplicates are dropped. To keep the last occurrence of each duplicated row, you can specify keep=’last’:

df = df.drop_duplicates(keep=’last’)

Documentation of the drop_duplicates function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html 

2 thoughts on “How to drop duplicates”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post