[英]Handling CSV with timezone-aware and timezone-naive datetime column
I have a pandas dataframe that is imported from a csv that looks like this:我有一个从 csv 导入的 Pandas 数据框,如下所示:
|date time|id|value|
|------|-------|---------|
|2019-10-08T01:00:00+01:00|1|35|
|2019-10-08T02:00:00+01:00|1|32|
|2019-10-08T03:00:00+01:00|1|33|
|2019-12-08T01:00:00Z|1|25|
|2019-12-08T01:00:00Z|1|15|
|2019-12-08T01:00:00Z|1|25|
When I try to do an aggregation like this:当我尝试进行这样的聚合时:
data.groupby([data['Date'].dt.date]).agg(['mean', 'count'])
I get an error like this:我收到这样的错误:
ValueError: Cannot mix tz-aware with tz-naive values
An additional wrinkle is that, it is important to use these date values and not the UTC values as I would be doing peak-hour analysis based on the local (British) time.另一个问题是,使用这些日期值而不是 UTC 值很重要,因为我将根据当地(英国)时间进行高峰时段分析。 What's the best way to fix this?解决这个问题的最佳方法是什么?
for given example with column date time
as string datatype,对于列date time
作为字符串数据类型的给定示例,
df['date time']
0 2019-10-08T01:00:00+01:00
1 2019-10-08T02:00:00+01:00
2 2019-10-08T03:00:00+01:00
3 2019-12-08T01:00:00Z
4 2019-12-08T01:00:00Z
5 2019-12-08T01:00:00Z
Name: date time, dtype: object
convert to datetime datatype using pd.to_datetime with keyword utc=True
, then convert to the appropriate time zone:使用pd.to_datetime和关键字utc=True
转换为日期时间数据类型,然后转换为适当的时区:
df['date time'] = pd.to_datetime(df['date time'], utc=True).dt.tz_convert('Europe/London')
to get要得到
df['date time']
0 2019-10-08 01:00:00+01:00
1 2019-10-08 02:00:00+01:00
2 2019-10-08 03:00:00+01:00
3 2019-12-08 01:00:00+00:00
4 2019-12-08 01:00:00+00:00
5 2019-12-08 01:00:00+00:00
Name: date time, dtype: datetime64[ns, Europe/London]
Now the groupby
works as intended:现在groupby
按预期工作:
df.groupby([df['date time'].dt.date]).agg(['mean', 'count'])
id value
mean count mean count
date time
2019-10-08 1 3 33.333333 3
2019-12-08 1 3 21.666667 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.