[英]Resampling with 'how=count' causing problems
I have a simple pandas dataframe that has measurements at various times: 我有一个简单的pandas数据帧,可以在不同的时间进行测量:
volume
t
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
I'm doing some basic resampling and plotting, such as the daily total volume, which works fine: 我正在做一些基本的重新采样和绘图,例如每日总量,它工作正常:
df.resample('D',how='sum').head()
volume
t
2013-10-13 184
2013-10-14 209
2013-10-15 197
2013-10-16 309
2013-10-17 317
But for some reason when I try do the total number of entries per day, it returns aa multiindex series instead of a dataframe: 但出于某些原因,当我尝试每天输入总数时,它会返回一个多索引系列而不是数据帧:
df.resample('D',how='count').head()
2013-10-13 volume 9
2013-10-14 volume 9
2013-10-15 volume 7
2013-10-16 volume 9
2013-10-17 volume 10
I can fix the data so it's easily plotted with a simple unstack call, ie df.resample('D',how='count').unstack()
, but why does calling resample with how='count'
have a different behavior than with how='sum'
? 我可以修复数据,因此可以通过简单的非
df.resample('D',how='count').unstack()
调用轻松绘制,即df.resample('D',how='count').unstack()
,但为什么调用resample with how='count'
会有不同的行为而不是how='sum'
?
It does appear the resample
and count
leads to some odd behavior in terms of how the resulting dataframe is structured (Well, at least up to 0.13.1). 看来
resample
和count
导致了一些奇怪的行为,就结果数据帧的结构而言(嗯,至少高达0.13.1)。 See here for a slightly different but related context: Count and Resampling with a mutli-ndex 请参阅此处了解略有不同但相关的背景: 使用多重索引进行计数和重新采样
You can use the same strategy here: 您可以在此处使用相同的策略:
>>> df
volume
date
2013-10-13 02:45:00 17
2013-10-13 05:40:00 38
2013-10-13 09:30:00 29
2013-10-13 11:40:00 25
2013-10-13 12:50:00 11
2013-10-13 15:00:00 17
2013-10-13 17:10:00 15
2013-10-13 18:20:00 12
2013-10-13 20:30:00 20
2013-10-14 03:45:00 9
2013-10-14 06:40:00 30
2013-10-14 09:40:00 43
2013-10-14 11:05:00 10
So here is your issue: 所以这是你的问题:
>>> df.resample('D',how='count')
2013-10-13 volume 9
2013-10-14 volume 4
You can fix the issue by specifying that count
applies to the volume
column with a dict in the resample
call: 您可以通过在
resample
调用中使用dict指定count
应用于volume
列来解决此问题:
>>> df.resample('D',how={'volume':'count'})
volume
date
2013-10-13 9
2013-10-14 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.