简体   繁体   English

合并不同列上的多个数据框

[英]Merging multiple dataframes on different columns

Using Pandas 1.2.1使用 Pandas 1.2.1

MRE: MRE:

df_a = pd.DataFrame({"A":[1,2,3,4], "B":[33, 44, 55, 66]})
df_b = pd.DataFrame({"B":[33, 44,99], "C":["v", "z", "z"]})
df_c = pd.DataFrame({"A":[3,4,77,55], "D":["aa", "bb", "cc", "dd"]})

Using three dfs created above I want to join all of them together however使用上面创建的三个 dfs 我想将它们全部连接在一起

  1. df_a, df_b share column "B" therefore they join on column "B" df_a, df_b 共享列“B”因此他们加入列“B”
  2. df_a, df_c share column "A" therefore they join on column "A" df_a, df_c 共享列“A”,因此他们加入列“A”

I want to left_join df_b and df_c onto df_a.我想left_join df_b 和df_c 到df_a。 currently this is my method:目前这是我的方法:

merged_df = pd.merge(df_a, df_b, on=["B"], how="left")
merged_df = pd.merge(merged_df, df_c, on=["A"], how="left")

I know works fine however I cannot stop to think there is a easier and faster way, there are multiple questions on joining multiple dfs on same column using reduce function however could not find solution for my question.我知道工作正常但是我不能停下来认为有一种更简单和更快的方法,在使用 reduce function 加入同一列上的多个 dfs 时有多个问题但是找不到我的问题的解决方案。

You can remove on parameter, so it merging by intersection of columns names between DataFrames:您可以删除on参数,因此它通过 DataFrame 之间的列名称的交集进行合并:

merged_df = pd.merge(df_a, df_b, how="left")
merged_df = pd.merge(merged_df, df_c, how="left")

More dynamic is use reduce , also is removed on parameter:更动态的是使用reduceon参数上也被删除:

from functools import reduce
dfList = [df1, df2, df3]
df = reduce(lambda df1,df2: pd.merge(df1,df2,how="left"), dfList)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM