[英]Merge two different sized dataframes into one in python
dataframe a: dataframe 一个:
order_id item_id description quantity value
0 1 11 ball 50 100
1 2 12 bat 25 50
2 2 13 glove 75 150
3 3 11 ball 25 50
4 3 13 glove 25 50
5 4 12 bat 75 150
6 5 12 bat 25 50
7 5 11 ball 50 100
dataframe b: dataframe b:
order_id customer_id ordered delivered
0 1 123 14/03/20 18/03/20
1 2 124 14/03/20 18/03/20
2 3 125 15/03/20 19/03/20
3 4 123 16/03/20 20/03/20
4 5 125 17/03/20 21/03/20
5 6 124 17/03/20 21/03/20
What I would like to do is combine the two so I have one table with the information in one place that is easy to analyse.我想做的是将两者结合起来,这样我就有一张表,其中的信息放在一个易于分析的地方。 I am confused by having multiple lines per order so not sure how best I can deal with that.
我对每个订单有多行感到困惑,所以不确定我能如何最好地处理这个问题。 Ideally I would have one line per order ID that showed the items ordered with total value.
理想情况下,我会为每个订单 ID 设置一行,显示按总价值订购的商品。 Perhaps something like:
也许是这样的:
Output: Output:
order_id customer_id item_1 quan_1 item_2 quan_2 item_3 quan_3 value ordered delivered
0 1 123 11 50 nan nan nan nan 100 14/03/20 18/03/20
1 2 124 12 25 13 75 nan nan 200 14/03/20 18/03/20
etc..... ETC.....
Try:尝试:
dfm = df_A.merge(df_B, on='order_id')
dfm = dfm.groupby(['order_id',
'customer_id', 'value',
'ordered','delivered',
dfm.groupby('order_id').cumcount() + 1])\
.sum().unstack()
dfm.columns = [f'{i}_{j}' for i, j in dfm.columns]
dfm.reset_index('value').sum(level=[0,1,2,3]).reset_index()
Output: Output:
order_id customer_id ordered delivered value item_id_1 item_id_2 quantity_1 quantity_2
0 1 123 14/03/20 18/03/20 100 11.0 0.0 50.0 0.0
1 2 124 14/03/20 18/03/20 200 12.0 13.0 25.0 75.0
2 3 125 15/03/20 19/03/20 50 11.0 13.0 25.0 25.0
3 4 123 16/03/20 20/03/20 150 12.0 0.0 75.0 0.0
4 5 125 17/03/20 21/03/20 150 12.0 11.0 25.0 50.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.