[英]Pandas groupby datatime index, possible bug
I have a Pandas DataFrame with a column that is a tz-aware TimeStamp and I tried to groupby(level=0).first(). 我有一个Pandas DataFrame,其列是一个tz感知的TimeStamp,我试图groupby(level = 0).first()。 I get an incorrect result.
我的结果不正确。 Am I missing something or is it a pandas bug?
我错过了什么或是熊猫虫吗?
x = pd.DataFrame(index = [1,1,2,2,2], data = pd.date_range("7:00", "9:00", freq="30min", tz = 'US/Eastern'))
In [58]: x
Out[58]:
0
1 2016-09-08 07:00:00-04:00
1 2016-09-08 07:30:00-04:00
2 2016-09-08 08:00:00-04:00
2 2016-09-08 08:30:00-04:00
2 2016-09-08 09:00:00-04:00
In [59]: x.groupby(level=0).first()
Out[59]:
0
1 2016-09-08 11:00:00-04:00
2 2016-09-08 12:00:00-04:00
I don't believe that it is a bug. 我不相信这是一个错误。 If you go through the
pytz
docs, it is clearly indicated that for timezone US/Eastern, there is no way to specify before / after the end-of-daylight-saving-time transition. 如果您浏览了
pytz
文档,则清楚地表明,对于时区US / Eastern,没有办法在夏令时结束时间之前/之后指定。
In such cases, sticking with UTC seems to be the best option. 在这种情况下,坚持使用UTC似乎是最好的选择。
Excerpt from the docs
: 摘自
docs
:
Be aware that timezones (eg, pytz.timezone('US/Eastern')) are not necessarily equal across timezone versions. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Either use the same version of timezone library or use tz_convert with the updated timezone definition.
The conversion can be done as follows: 转换可以按如下方式完成:
A: using tz_localize
method to localize naive/time-aware datetime to UTC 答:使用
tz_localize
方法将天真/时间感知日期时间本地化为UTC
data = pd.date_range("7:00", "9:00", freq="30min").tz_localize('UTC')
B: using tz_convert
method to convert pandas objects to convert tz aware data to another time zone. B:使用
tz_convert
方法转换pandas对象以将tz感知数据转换为另一个时区。
df = pd.DataFrame(index=[1,1,2,2,2], data=data.tz_convert('US/Eastern'))
df.groupby(level=0).first()
which results in: 这导致:
0
1 2016-09-09 07:00:00-04:00
2 2016-09-09 08:00:00-04:00
#0 datetime64[ns, US/Eastern]
#dtype: object
This is actually a pandas bug reported here: 这实际上是这里报告的一个大熊猫bug:
https://github.com/pydata/pandas/issues/10668 https://github.com/pydata/pandas/issues/10668
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.