简体   繁体   English

复杂的 function 与 groupby 之间? Python

[英]Complicated function with groupby and between? Python

Here is a sample dataset.这是一个示例数据集。

import pandas as pd
import numpy as np
df = pd.DataFrame({ 
    'VipNo':np.repeat( range(3), 2 ),
    'Quantity': np.random.randint(200,size=6),
    'OrderDate': np.random.choice( pd.date_range('1/1/2020', periods=365, freq='D'), 6, replace=False)})
print(df)

So I have a couple of steps to do.所以我有几个步骤要做。 I want to create a new column named qtywithin1mon/totalqty.我想创建一个名为 qtywithin1mon/totalqty 的新列。 First I want to group the VipNo (each number represents an individual) because a person may have made multiple purchases.首先,我想对 VipNo(每个数字代表一个人)进行分组,因为一个人可能进行了多次购买。 Then I want to see if the orderdate is within a certain range (let's say 2020/03/01 - 2020/03/31).然后我想看看订单日期是否在某个范围内(比如 2020/03/01 - 2020/03/31)。 If so, I want to use the respective quantity on that day divided by the total quantity this customer purchased.如果是这样,我想使用当天各自的数量除以该客户购买的总数量。 My dataset is big so a customer may have ordered twice within the time range and I would want the sum of the two orders divided by the total quantity in this case.我的数据集很大,因此客户可能在该时间范围内订购了两次,在这种情况下,我希望将两次订单的总和除以总数量。 How can I achieve this goal?我怎样才能实现这个目标? I really have no idea where to start..我真的不知道从哪里开始..

Thank you so much!太感谢了!

You can create a new column masking quantity within the given date range, then groupby:您可以在给定的日期范围内创建一个新的列屏蔽数量,然后 groupby:

start, end = pd.to_datetime(['2020/03/01','2020/03/31'])

(df.assign(QuantitySub=df['OrderDate'].between(start,end)*df.Quantity)
   .groupby('VipNo')[['Quantity','QuantitySub']]
   .sum()
   .assign(output=lambda x: x['QuantitySub']/x['Quantity'])
   .drop('QuantitySub', axis=1)
)

With a data frame:使用数据框:

   VipNo  Quantity  OrderDate
0      0       105 2020-01-07
1      0        56 2020-03-04
2      1       167 2020-09-05
3      1        18 2020-05-08
4      2       151 2020-11-01
5      2        14 2020-03-17

The output is: output 是:

       Quantity    output
VipNo            
0           161  0.347826
1           185  0.000000
2           165  0.084848

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM