[英]Pandas get average time interval within groups
I have a DataFrame containing an EffectiveDate
column. 我有一个包含
EffectiveDate
列的DataFrame。 I want to groupby the DataFrame by a Key value and then calculate the average time interval for all the date values in each group for the EffectiveDate
column. 我想按一个Key值对DataFrame进行分组,然后为
EffectiveDate
列计算每个组中所有日期值的平均时间间隔。
For example for the DataFrame: 例如对于DataFrame:
EffectiveDate
1 2015-08-17 07:00:00
1 2015-08-18 07:00:00
1 2015-08-19 07:00:00
2 2015-08-20 07:00:00
2 2015-08-21 07:00:00
2 2015-09-16 07:00:00
2 2015-10-15 07:00:00
2 2015-11-16 08:00:00
I want to groupby the Index and calculate the average interval between the date values in the EffectiveDate column. 我想对索引进行分组,并计算EffectiveDate列中日期值之间的平均间隔。
15199 2015-08-17 07:00:00
15214 2015-08-18 07:00:00
15219 2015-08-19 07:00:00
15233 2015-08-20 07:00:00
15254 2015-08-21 07:00:00
15687 2015-09-16 07:00:00
199 2015-10-15 07:00:00
1123 2015-11-16 08:00:00
Name: EffectiveDate, dtype: datetime64[ns]
On a single Series this seems to work fine: 在单个系列中,这似乎可以正常工作:
EffectiveDate.diff().astype('timedelta64[s]').mean()
However when I am using the same function as a groupby aggregate on a pandas DataFrame: 但是,当我在pandas DataFrame上使用与groupby聚合相同的功能时:
df.groupby('Key').agg({
'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
})
The results are 结果是
EffectiveDate
1 1970-01-01 00:00:00.016747425
2 1970-01-01 00:00:00.017765280
3 1970-01-01 00:00:00.034776096
4 1970-01-01 00:00:00.002052450
5 1970-01-01 00:00:00.018238800
6 1970-01-01 00:00:00.024005438
7 1970-01-01 00:00:00.012330000
I would expect an integer field in each column. 我希望每一列都有一个整数字段。 I am using Pandas
0.19.2
. 我正在使用Pandas
0.19.2
。
GroupBy.agg
seems to attempt to cast back to the original dtype of the EffectiveDate column in 0.19.2
. GroupBy.agg
似乎尝试回 0.19.2
中的EffectiveDate列的原始0.19.2
。 This might make sense generally I think, as we would expect an aggregation down the column to have the same dtype. 我认为这通常可能是合理的,因为我们希望该列下方的聚合具有相同的dtype。
To fix this issue, you could use GroupBy.apply
instead in 0.19.2
, which doesn't perform the same cast afterwards. 要解决此问题,您可以在
0.19.2
使用GroupBy.apply
,此后不执行相同的转换。
df.groupby(df.index).apply(
lambda x: x.diff().astype('timedelta64[s]').mean()
)
Seemingly this didn't used to be the case, as I can reproduce your behavior in 0.18.1
only after casting to the original dtype of the EffectiveDate column. 似乎情况并非如此,因为只有在转换为 EffectiveDate列的原始
0.18.1
后,我才能在0.18.1
重现您的行为。
In 0.18.1
在
0.18.1
>>> df
EffectiveDate
1 2015-08-17 07:00:00
1 2015-08-18 07:00:00
1 2015-08-19 07:00:00
2 2015-08-20 07:00:00
2 2015-08-21 07:00:00
2 2015-09-16 07:00:00
2 2015-10-15 07:00:00
2 2015-11-16 08:00:00
>>> df.groupby(df.index).agg({
'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
})
EffectiveDate
1 86400.0
2 1901700.0
>>> df.groupby(df.index).agg({
'EffectiveDate': lambda x: x.diff().astype('timedelta64[s]').mean()
}).astype(df.EffectiveDate.dtype)
EffectiveDate
1 1970-01-01 00:00:00.000086400
2 1970-01-01 00:00:00.001901700
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.