简体   繁体   English

Pandas groupby datatime index,可能是bug

[英]Pandas groupby datatime index, possible bug

I have a Pandas DataFrame with a column that is a tz-aware TimeStamp and I tried to groupby(level=0).first(). 我有一个Pandas DataFrame,其列是一个tz感知的TimeStamp,我试图groupby(level = 0).first()。 I get an incorrect result. 我的结果不正确。 Am I missing something or is it a pandas bug? 我错过了什么或是熊猫虫吗?

x = pd.DataFrame(index = [1,1,2,2,2], data = pd.date_range("7:00", "9:00", freq="30min", tz = 'US/Eastern'))

In [58]: x
Out[58]: 


     0
1 2016-09-08 07:00:00-04:00
1 2016-09-08 07:30:00-04:00
2 2016-09-08 08:00:00-04:00
2 2016-09-08 08:30:00-04:00
2 2016-09-08 09:00:00-04:00

In [59]: x.groupby(level=0).first()
Out[59]: 
                          0
1 2016-09-08 11:00:00-04:00
2 2016-09-08 12:00:00-04:00

I don't believe that it is a bug. 我不相信这是一个错误。 If you go through the pytz docs, it is clearly indicated that for timezone US/Eastern, there is no way to specify before / after the end-of-daylight-saving-time transition. 如果您浏览了pytz文档,则清楚地表明,对于时区US / Eastern,没有办法在夏令时结束时间之前/之后指定。

In such cases, sticking with UTC seems to be the best option. 在这种情况下,坚持使用UTC似乎是最好的选择。

Excerpt from the docs : 摘自docs

  Be aware that timezones (eg, pytz.timezone('US/Eastern')) are not necessarily equal across timezone versions. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Either use the same version of timezone library or use tz_convert with the updated timezone definition. 

The conversion can be done as follows: 转换可以按如下方式完成:

A: using tz_localize method to localize naive/time-aware datetime to UTC 答:使用tz_localize方法将天真/时间感知日期时间本地化为UTC

data = pd.date_range("7:00", "9:00", freq="30min").tz_localize('UTC')

B: using tz_convert method to convert pandas objects to convert tz aware data to another time zone. B:使用tz_convert方法转换pandas对象以将tz感知数据转换为另一个时区。

df = pd.DataFrame(index=[1,1,2,2,2], data=data.tz_convert('US/Eastern'))
df.groupby(level=0).first()

which results in: 这导致:

                          0
1 2016-09-09 07:00:00-04:00
2 2016-09-09 08:00:00-04:00

#0    datetime64[ns, US/Eastern]
#dtype: object

This is actually a pandas bug reported here: 这实际上是这里报告的一个大熊猫bug:

https://github.com/pydata/pandas/issues/10668 https://github.com/pydata/pandas/issues/10668

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM