简体   繁体   English

Pandas数据框,包含时间戳和时区列

[英]Pandas dataframe with column of timestamps and timezones

I have a pandas dataframe with a column of timestamps and a column of timezones the timestamps are in. What's the best way to convert all these timestamps to UTC time? 我有一个pandas数据帧,其中包含一列时间戳和一组时间戳所在的时区。将所有这些时间戳转换为UTC时间的最佳方法是什么?

Sample data in csv: csv中的示例数据:

0,2000-01-28 16:47:00,America/Chicago
1,2000-01-29 16:48:00,America/Chicago
2,2000-01-30 16:49:00,America/Los_Angeles
3,2000-01-31 16:50:00,America/Chicago
4,2000-01-01 16:50:00,America/New_York

This can be efficiently done by converting a single tz at a time (but since we have many, groupby already separates these out). 这可以通过一次转换一个tz来有效地完成(但由于我们有很多,groupby已经将它们分开)。 These are local times (IOW in the given timezone), so tz_localize makes these tz-aware. 这些是本地时间(在给tz_localize IOW),因此tz_localize使这些tz感知。 Then when we combine them these are auto-magically converted to UTC. 然后,当我们将它们组合在一起时,它们会自动神奇地转换为UTC。

Note this is on master/0.17.0, releasing soon. 请注意,这是在master / 0.17.0上,很快就会发布。 Soln for < 0.17.0 is below 溶解<0.17.0以下

In [19]: df = read_csv(StringIO(data),header=None, names=['value','date','tz'])

In [20]: df.dtypes
Out[20]: 
value     int64
date     object
tz       object
dtype: object

In [21]: df
Out[21]: 
   value                 date                   tz
0      0  2000-01-28 16:47:00      America/Chicago
1      1  2000-01-29 16:48:00      America/Chicago
2      2  2000-01-30 16:49:00  America/Los_Angeles
3      3  2000-01-31 16:50:00      America/Chicago
4      4  2000-01-01 16:50:00     America/New_York

In [22]: df['utc'] = df.groupby('tz').date.apply(
                lambda x: pd.to_datetime(x).dt.tz_localize(x.name))

In [23]: df
Out[23]: 
   value                 date                   tz                 utc
0      0  2000-01-28 16:47:00      America/Chicago 2000-01-28 22:47:00
1      1  2000-01-29 16:48:00      America/Chicago 2000-01-29 22:48:00
2      2  2000-01-30 16:49:00  America/Los_Angeles 2000-01-31 00:49:00
3      3  2000-01-31 16:50:00      America/Chicago 2000-01-31 22:50:00
4      4  2000-01-01 16:50:00     America/New_York 2000-01-01 21:50:00

In [24]: df.dtypes
Out[24]: 
value             int64
date             object
tz               object
utc      datetime64[ns]
dtype: object

In < 0.17.0, need to: 在<0.17.0,需要:

df['utc'] = df['utc'].dt.tz_localize(None)

to convert to UTC 转换为UTC

In general: combine the 2 csv time columns during the import (or before). 通常: 导入期间 (或之前)组合2个csv 时间列。 This can be done with a small lambda-function. 这可以通过一个小的lambda函数来完成。

To convert (parse) that combined info, several options exist. 要转换(解析)组合信息,存在多个选项。 Most are described here or in the pandas-docs. 大多数都在这里或pandas-docs中描述。 Personally I like the utils.parse one. 我个人喜欢utils.parse

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM