[英]How do I generate different columns based on multiple values from different columns using pandas
我有以下数据集
data = {
'Partner': ['More', 'More', 'More', 'Reliance','Reliance','Reliance','Reliance','Reliance', 'More', 'More','Azfresh','Azfresh','Azfresh','Azfresh','Azfresh'],
'Brand': ['Biseliri','Biseliri','Biseliri','Biseliri','Biseliri','Biseliri','Kinili','Kinili','Kinili','Kinili','Biseliri','Biseliri','Biseliri','Kinili','Kinili'],
'Category': ['Milk','Milk','Milk','Milk','Milk','Milk','Water','Water','Water','Water','Water','Water','Water','Milk','Milk'],
'Product':['Milk_a','Milk_a','Milk_a','Milk_a','Milk_b','Milk_b','Water_a','Water_a','Water_b','Water_b','Water_a','Water_b','Water_a','Milk_b','Milk_b'],
'Yearweek':[202001,202003,202004,202001,202001,202002,202001,202002,202001,202002,202001,202001,202003,202001,202002],
'MRP':[50,45,50,50,45,45,100,90,150,150,110,150,100,50,50]}
我正在尝试按合作伙伴、品牌、类别、产品对数据进行分组,并获取产品 MRP 的减少/增加,并查看价格降低了多长时间。
Brand Category MRP Partner Product Yearweek
0 Biseliri Milk 50 More Milk_a 202001
1 Biseliri Milk 45 More Milk_a 202003
2 Biseliri Milk 50 More Milk_a 202004
3 Biseliri Milk 50 Reliance Milk_a 202001
4 Biseliri Milk 45 Reliance Milk_b 202001
5 Biseliri Milk 45 Reliance Milk_b 202002
6 Kinili Water 100 Reliance Water_a 202001
7 Kinili Water 90 Reliance Water_a 202002
8 Kinili Water 150 More Water_b 202001
9 Kinili Water 150 More Water_b 202002
10 Biseliri Water 110 Azfresh Water_a 202001
11 Biseliri Water 150 Azfresh Water_b 202001
12 Biseliri Water 100 Azfresh Water_a 202003
13 Kinili Milk 50 Azfresh Milk_b 202001
14 Kinili Milk 50 Azfresh Milk_b 202002
所以我尝试使用下面的代码进行分组
groupeddata = df.groupby(['Brand','Category','Partner','Product','Yearweek']).agg({'MRP':'min'}).reset_index()
使用最小 MRP 聚合,以防同一组数据有多个 MRP 发布此消息后,我使用此代码生成该组产品价格之间的差异,以查看价格的上涨或下跌。 但我不确定如何根据 Yearweek 来做。
groupeddata['diff'] = groupeddata['MRP'].shift(+1)-groupeddata['MRP']
groupeddata['diff'].fillna('0',inplace = True)
groupeddata['diff'] = groupeddata['diff'].apply(lambda x:int(x))
groupeddata['mrpoff'] = groupeddata['diff'].astype(str)+np.where(groupeddata.eval("diff>0"),"rs less"," rs increased")
但这会产生错误的df。
我正在努力实现这一点:如果价格差异保持超过 2 周,那么 noofdays 应该是 14,就像在第 1 行和第 2 行的情况一样 - MRP 仅在 1 周停留在 45 后才会增加。如果 MRP 停留202003 和 202004 为 45 并在未来增加,然后 noofdays 应为 2 周 * 7 天 - 14 天
Brand Category MRP Partner Product Yearweek diff noofdays
0 Biseliri Milk 50 More Milk_a 202001 0 0
1 Biseliri Milk 45 More Milk_a 202003 5 7
2 Biseliri Milk 50 More Milk_a 202004 -5 0
3 Biseliri Milk 50 Reliance Milk_a 202001 0 0
4 Biseliri Milk 45 Reliance Milk_b 202001 0 0
5 Biseliri Milk 45 Reliance Milk_b 202002 0 0
6 Kinili Water 100 Reliance Water_a 202001 0 0
7 Kinili Water 90 Reliance Water_a 202002 10 7
8 Kinili Water 150 More Water_b 202001 0 0
9 Kinili Water 150 More Water_b 202002 0 0
10 Biseliri Water 110 Azfresh Water_a 202001 0 0
11 Biseliri Water 150 Azfresh Water_b 202001 0 0
12 Biseliri Water 100 Azfresh Water_a 202003 10 7
13 Kinili Milk 50 Azfresh Milk_b 202001 0 0
14 Kinili Milk 50 Azfresh Milk_b 202002 0 0
请帮忙,谢谢!
我不太明白你在追求什么,但也许这是一个开始?
(df
.assign(diff=lambda x: x.groupby(['Brand','Category','Partner','Product'])["MRP"].transform(lambda x: x.diff()))
.fillna(0)
.sort_values(['Brand','Category','Partner','Product', 'Yearweek'])
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.