[英]Complicated function with groupby and between? Python
Here is a sample dataset.这是一个示例数据集。
import pandas as pd
import numpy as np
df = pd.DataFrame({
'VipNo':np.repeat( range(3), 2 ),
'Quantity': np.random.randint(200,size=6),
'OrderDate': np.random.choice( pd.date_range('1/1/2020', periods=365, freq='D'), 6, replace=False)})
print(df)
So I have a couple of steps to do.所以我有几个步骤要做。 I want to create a new column named qtywithin1mon/totalqty.
我想创建一个名为 qtywithin1mon/totalqty 的新列。 First I want to group the VipNo (each number represents an individual) because a person may have made multiple purchases.
首先,我想对 VipNo(每个数字代表一个人)进行分组,因为一个人可能进行了多次购买。 Then I want to see if the orderdate is within a certain range (let's say 2020/03/01 - 2020/03/31).
然后我想看看订单日期是否在某个范围内(比如 2020/03/01 - 2020/03/31)。 If so, I want to use the respective quantity on that day divided by the total quantity this customer purchased.
如果是这样,我想使用当天各自的数量除以该客户购买的总数量。 My dataset is big so a customer may have ordered twice within the time range and I would want the sum of the two orders divided by the total quantity in this case.
我的数据集很大,因此客户可能在该时间范围内订购了两次,在这种情况下,我希望将两次订单的总和除以总数量。 How can I achieve this goal?
我怎样才能实现这个目标? I really have no idea where to start..
我真的不知道从哪里开始..
Thank you so much!太感谢了!
You can create a new column masking quantity within the given date range, then groupby:您可以在给定的日期范围内创建一个新的列屏蔽数量,然后 groupby:
start, end = pd.to_datetime(['2020/03/01','2020/03/31'])
(df.assign(QuantitySub=df['OrderDate'].between(start,end)*df.Quantity)
.groupby('VipNo')[['Quantity','QuantitySub']]
.sum()
.assign(output=lambda x: x['QuantitySub']/x['Quantity'])
.drop('QuantitySub', axis=1)
)
With a data frame:使用数据框:
VipNo Quantity OrderDate
0 0 105 2020-01-07
1 0 56 2020-03-04
2 1 167 2020-09-05
3 1 18 2020-05-08
4 2 151 2020-11-01
5 2 14 2020-03-17
The output is: output 是:
Quantity output
VipNo
0 161 0.347826
1 185 0.000000
2 165 0.084848
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.