简体   繁体   English

三个数据框的交叉连接

[英]Cross join of three dataframes

I would like to join three dataframes of the following structure:我想加入以下结构的三个数据框:

january_df=pd.DataFrame({
    'January':[4,4,3,2,1,1],
    'Product_no':['B1','B2','S1','S2','B3','T1'],
    'Label':['Ball','Bikini','Shoe','Shirt','Bag','Towel'],
    'ID':[1000, 1001, 1002, 1003, 1004, 1005],
})

february_df=pd.DataFrame({
    'February':[4,3,3,2,1,1],
    'Product_no':['S1','B2','B1','T1','S2','B3'],
    'Label':['Shoe','Bikini','Ball','Towel','Shirt','Bag'],
    'ID':[1002, 1001, 1000, 1005, 1003, 1004],
})

march_df=pd.DataFrame({
    'March':[5,1,1,1,1,1],
    'Product_no':['T1','E1','S1','B3','L1','B1'],
    'Label':['Towel','Earring','Shoe','Bag','Lotion','Ball'],
    'ID':[1005, 1006, 1002, 1004, 1007, 1000],
})

The desired output for March should be:三月所需的 output 应该是:

   January  February  March  Product_no Label      ID
---------------------------------------------------------
01   1          2        5    T1        Towel      1005
02   0          0        1    E1        Earring    1006
03   3          4        1    S1        Shoe       1002
04   1          1        1    B3        Bag        1004
05   0          0        1    L1        Lotion     1006
06   4          3        1    B1        Ball       1000

In a first step I tried to merge March and February第一步,我尝试合并三月和二月

all_df = pd.merge(march_df, february_df, on="ID")

but it does not yield the result for the two months.但它并没有产生两个月的结果。 I tried to understand the hints on Performant cartesian product (CROSS JOIN) with pandas and pandas three-way joining multiple dataframes on columns but did not get any wiser.我试图用 pandas 和pandas 三路连接列上的多个数据帧来理解有关高性能笛卡尔积(CROSS JOIN)的提示,但没有得到任何更明智的结果。

In R it can be achieved as a "piped multiple join"在 R 中,它可以实现为“管道多重连接”

threeMonths <- February%>%
      right_join(March)%>%
      left_join(January)

which I cannot seem to translate into Python.我似乎无法翻译成 Python。

How do I get the output as wanted?如何获得所需的 output?

You can merge in two steps.您可以分两步合并。 For example for March:以三月为例:

tmp = pd.merge(january_df, february_df, on='ID')
final_df = pd.merge(tmp, march_df, on='ID', how='right')[['January', 'February', 'March', 'Product_no', 'Label', 'ID']].fillna(0)

print(final_df)

Prints:印刷:

   January  February  March Product_no    Label    ID
0      1.0       2.0      5         T1    Towel  1005
1      0.0       0.0      1         E1  Earring  1006
2      3.0       4.0      1         S1     Shoe  1002
3      1.0       1.0      1         B3      Bag  1004
4      0.0       0.0      1         L1   Lotion  1007
5      4.0       3.0      1         B1     Ball  1000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM