Here is a short tutorial on how to merge two dataframes using Pandas Python module.
The most common way of merging dataframes is to use merge Pandas function.
Inner merge
import pandas as pd df1 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string4"], "numbers": [103, 105, 201, 122]}) df2 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string5"], "numbers": [105, 144, 195, 101]}) print(pd.merge(df1, df2, on="strings", how="inner"))
Firstly, I create two dataframes. Next, I merge them using the merge method of Pandas.
strings numbers_x numbers_y 0 string1 103 105 1 string2 105 144 2 string3 201 195
I chose inner method so Pandas merges only common strings which are present in both dataframes.
There are different methods or merges.
Left merge
import pandas as pd df1 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string4"], "numbers": [103, 105, 201, 122]}) df2 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string5"], "numbers": [105, 144, 195, 101]}) print(pd.merge(df1, df2, on="strings", how="left"))
strings numbers_x numbers_y 0 string1 103 105.0 1 string2 105 144.0 2 string3 201 195.0 3 string4 122 NaN
Left method merges the first dataframe with the corresponding part of right dataframe.
In case you needed the right dataframe use right method.
Right merge
import pandas as pd df1 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string4"], "numbers": [103, 105, 201, 122]}) df2 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string5"], "numbers": [105, 144, 195, 101]}) print(pd.merge(df1, df2, on="strings", how="right"))
strings numbers_x numbers_y 0 string1 103.0 105 1 string2 105.0 144 2 string3 201.0 195 3 string5 NaN 101
In case you needed every record use outer method.
Outer merge
import pandas as pd df1 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string4"], "numbers": [103, 105, 201, 122]}) df2 = pd.DataFrame({"strings": ["string1", "string2", "string3", "string5"], "numbers": [105, 144, 195, 101]}) print(pd.merge(df1, df2, on="strings", how="outer"))
strings numbers_x numbers_y 0 string1 103.0 105.0 1 string2 105.0 144.0 2 string3 201.0 195.0 3 string4 122.0 NaN 4 string5 NaN 101.0
Now you know how to merge two dataframes in Pandas.
Here is the link to the documentation to get to know more advanced options:
2 thoughts on “How to merge two dataframes”