简体   繁体   中英

Merging multiple dataframes on different columns

Using Pandas 1.2.1

MRE:

df_a = pd.DataFrame({"A":[1,2,3,4], "B":[33, 44, 55, 66]})
df_b = pd.DataFrame({"B":[33, 44,99], "C":["v", "z", "z"]})
df_c = pd.DataFrame({"A":[3,4,77,55], "D":["aa", "bb", "cc", "dd"]})

Using three dfs created above I want to join all of them together however

  1. df_a, df_b share column "B" therefore they join on column "B"
  2. df_a, df_c share column "A" therefore they join on column "A"

I want to left_join df_b and df_c onto df_a. currently this is my method:

merged_df = pd.merge(df_a, df_b, on=["B"], how="left")
merged_df = pd.merge(merged_df, df_c, on=["A"], how="left")

I know works fine however I cannot stop to think there is a easier and faster way, there are multiple questions on joining multiple dfs on same column using reduce function however could not find solution for my question.

You can remove on parameter, so it merging by intersection of columns names between DataFrames:

merged_df = pd.merge(df_a, df_b, how="left")
merged_df = pd.merge(merged_df, df_c, how="left")

More dynamic is use reduce , also is removed on parameter:

from functools import reduce
dfList = [df1, df2, df3]
df = reduce(lambda df1,df2: pd.merge(df1,df2,how="left"), dfList)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM