简体   繁体   English

Pandas Groupby Dates,然后是Group的Cumprod?

[英]Pandas Groupby Dates, then Cumprod of Group?

I have a list of values with datetimes:我有一个带有日期时间的值列表:

     Datetime         Val 
[[2017-01-01 15:00:00, 2],
 [2017-02-05 19:00:00, 3],
 [2018-04-22 15:00:00, 6],
 [2018-08-02 13:00:00, 3],
 [2018-10-03 12:00:00, 3]]

I want to group values into N number of equally spaced bins by datetime and then get a list of the cumprod of vals for each group, if a group bin is empty, the cumprod is 1.我想按日期时间将值分组到 N 个等距的 bin 中,然后获取每个组的 cumprod 列表,如果组 bin 为空,则 cumprod 为 1。

My current approach is calculating the first and last timestamp, then using linspace to calculate the equally spaced datetime bins, this is where I'm stuck:我目前的方法是计算第一个和最后一个时间戳,然后使用 linspace 计算等距的日期时间箱,这就是我卡住的地方:

n = 5 # 5 equally sized bins
start = pd.Timestamp(df.iloc[0]['datetime'])
end = pd.Timestamp(df.iloc[-1]['datetime'])
bins = np.linspace(start.value, end.value, n+1) # n+1 as linspace is right bound including
groups = pd.to_datetime(bins).values

Returns:返回:

 ['2017-01-01T15:00:00.000000000' '2017-05-09T14:24:00.000000000'
 '2017-09-14T13:48:00.000000000' '2018-01-20T13:12:00.000000000'
 '2018-05-28T12:36:00.000000000' '2018-10-03T12:00:00.000000000']

Output with 5 equally spaced bins and the above given example values could be for example:具有 5 个等距 bin 的输出和上面给出的示例值可以是例如:

 output = [2*3, 1, 1, 6, 3*3] # 1 if there is no "Val" for a bin

Is there any efficient/clean way to solve this?有没有有效/干净的方法来解决这个问题? I have looked into pd.Grouper but I can't get the freq value to work to output equally spaced datetime groups.我已经研究过 pd.Grouper,但我无法获得频率值来输出等距的日期时间组。 Another solution I tried is turning datetimes into epochs, and then using np.digitize to categorize by bins.我尝试的另一个解决方案是将日期时间转换为纪元,然后使用 np.digitize 按箱进行分类。 But this also didn't work out.但这也没有奏效。 Appreciate any help, Numpy solutions also welcome.感谢任何帮助,也欢迎 Numpy 解决方案。

You can use pd.cut to specify your bins easily.您可以使用pd.cut轻松指定您的垃圾箱。 Then you need groupby + prod .然后你需要groupby + prod

df.groupby(pd.cut(df.Datetime, bins=5, right=False)).Val.prod()

Output:输出:

Datetime
[2017-01-01 15:00:00, 2017-05-09 14:24:00)           6
[2017-05-09 14:24:00, 2017-09-14 13:48:00)           1
[2017-09-14 13:48:00, 2018-01-20 13:12:00)           1
[2018-01-20 13:12:00, 2018-05-28 12:36:00)           6
[2018-05-28 12:36:00, 2018-10-04 03:21:25.200000)    9
Name: Val, dtype: int64

We automatically get your desired behavior of missing groups being filled with 1 becasuse with prod , empty Series and ndarrays multiply to 1.我们会自动获得您想要的缺失组填充 1 的行为,因为prod 、空Seriesndarrays乘以 1。

import numpy as np

np.prod(pd.Series())
#1.0

np.prod(np.ndarray(shape=0))
#1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM