In this post, I show how to join two dataframes on 2 columns in Pandas Python library.
You can join two pandas DataFrames on multiple columns by specifying multiple columns in the on parameter of the merge method.
How to join two dataframes on 2 columns
Here’s an example of how to perform an inner join on two DataFrames based on two columns, key1 and key2:
import pandas as pd my_df1 = pd.DataFrame({'key1': [1, 2, 3, 4, 5], 'key2': [6, 7, 8, 9, 10], 'col1': [10, 20, 30, 40, 50]}) my_df2 = pd.DataFrame({'key1': [2, 4, 6, 8, 10], 'key2': [7, 9, 11, 13, 15], 'col2': [100, 200, 300, 400, 500]}) my_final_df = pd.merge(my_df1, my_df2, on=['key1', 'key2'], how='inner') print(f'This is my final dataframe: \n{my_final_df}')
In this example, the on parameter is set to [‘key1’, ‘key2’] to specify that the join should be based on both the key1 and key2 columns. The how parameter is set to ‘inner’ to specify that an inner join should be performed.
The result of the join will be a new DataFrame that contains only the rows where there is a match in both input DataFrames on both the key1 and key2 columns. In this example, the result will have three columns: key1, key2, col1, col2.