简体   繁体   English

在 python 中将两个不同大小的数据帧合并为一个

[英]Merge two different sized dataframes into one in python

dataframe a: dataframe 一个:

   order_id  item_id description quantity value
0  1         11      ball           50     100
1  2         12      bat            25     50
2  2         13      glove          75     150 
3  3         11      ball           25     50
4  3         13      glove          25     50
5  4         12      bat            75     150
6  5         12      bat            25     50
7  5         11      ball           50     100

dataframe b: dataframe b:

  order_id  customer_id ordered   delivered
0 1         123         14/03/20  18/03/20
1 2         124         14/03/20  18/03/20
2 3         125         15/03/20  19/03/20
3 4         123         16/03/20  20/03/20
4 5         125         17/03/20  21/03/20
5 6         124         17/03/20  21/03/20

What I would like to do is combine the two so I have one table with the information in one place that is easy to analyse.我想做的是将两者结合起来,这样我就有一张表,其中的信息放在一个易于分析的地方。 I am confused by having multiple lines per order so not sure how best I can deal with that.我对每个订单有多行感到困惑,所以不确定我能如何最好地处理这个问题。 Ideally I would have one line per order ID that showed the items ordered with total value.理想情况下,我会为每个订单 ID 设置一行,显示按总价值订购的商品。 Perhaps something like:也许是这样的:

Output: Output:

  order_id customer_id item_1 quan_1 item_2 quan_2 item_3 quan_3 value ordered delivered 
0 1        123         11     50     nan    nan    nan    nan    100   14/03/20  18/03/20
1 2        124         12     25     13     75     nan    nan    200   14/03/20  18/03/20

etc..... ETC.....

Try:尝试:

dfm = df_A.merge(df_B, on='order_id')
dfm = dfm.groupby(['order_id',  
                     'customer_id', 'value',
                     'ordered','delivered', 
                     dfm.groupby('order_id').cumcount() + 1])\
         .sum().unstack()
dfm.columns = [f'{i}_{j}' for i, j in dfm.columns]

dfm.reset_index('value').sum(level=[0,1,2,3]).reset_index()

Output: Output:

   order_id  customer_id   ordered delivered  value  item_id_1  item_id_2  quantity_1  quantity_2
0         1          123  14/03/20  18/03/20    100       11.0        0.0        50.0         0.0
1         2          124  14/03/20  18/03/20    200       12.0       13.0        25.0        75.0
2         3          125  15/03/20  19/03/20     50       11.0       13.0        25.0        25.0
3         4          123  16/03/20  20/03/20    150       12.0        0.0        75.0         0.0
4         5          125  17/03/20  21/03/20    150       12.0       11.0        25.0        50.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM