[英]Aggregation on pandas datetime series only returns as datetime series
I have a dataframe like 我有一个数据框
test = pd.DataFrame({'date': ['2013-10-14 21:46:40', '2013-07-17 02:55:06', '2013-01-28 20:25:17'], 'category': [1, 1, 2]})
test['date'] = pd.to_datetime(test['date'])
category date
0 1 2013-10-14 21:46:40
1 1 2013-07-17 02:55:06
2 2 2013-01-28 20:25:17
and I would like to compute some summary statistics for each category, specifically the earliest and latest date as well as the number of items in each category. 我想为每个类别计算一些摘要统计信息,尤其是最早和最新日期以及每个类别中的项目数。 The obvious way (to me) to do this is:
(对我而言)最明显的方法是:
test.groupby('category')['date'].agg([len, min, max])
but when I do this, the len
column gets automatically cast as np.datetime64
, which I assume is happening because that's the dtype of the original date
column: 但是当我这样做时,
len
列会自动转换为np.datetime64
,我认为这是正在发生的,因为那是原始date
列的np.datetime64
:
len min max
category
1 1970-01-01 00:00:00.000000002 2013-07-17 02:55:06 2013-10-14 21:46:40
2 1970-01-01 00:00:00.000000001 2013-01-28 20:25:17 2013-01-28 20:25:17
I could go back and reconvert this len
column to nanoseconds since GMT epoch, but that is pretty ugly and I feel like there must be a better way. 自格林威治标准时间以来,我可以回过头将此
len
列转换为纳秒,但这非常丑陋,我觉得必须有更好的方法。 Any ideas? 有任何想法吗?
use 'size'
; 使用
'size'
; this is currently an API bug (in that the len
should just be translated directly to size
), see here 当前,这是一个API错误(因为
len
应该直接转换为size
),请参见此处
In [5]: test.groupby('category')['date'].agg(['size', min, max])
Out[5]:
size min max
category
1 2 2013-07-17 02:55:06 2013-10-14 21:46:40
2 1 2013-01-28 20:25:17 2013-01-28 20:25:17
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.