如何在pandas中使用read_csv读取时区感知日期时间作为时区天真本地DatetimeIndex？

Question

When I use pandas read_csv to read a column with a timezone aware datetime (and specify this column to be the index), pandas converts it to a timezone naive utc DatetimeIndex.当我使用 pandas read_csv 读取具有时区感知日期时间的列（并将此列指定为索引）时，pandas 将其转换为时区天真 utc DatetimeIndex。

Data in Test.csv: Test.csv 中的数据：

DateTime,Temperature 2016-07-01T11:05:07+02:00,21.125 2016-07-01T11:05:09+02:00,21.138 2016-07-01T11:05:10+02:00,21.156 2016-07-01T11:05:11+02:00,21.179 2016-07-01T11:05:12+02:00,21.198 2016-07-01T11:05:13+02:00,21.206 2016-07-01T11:05:14+02:00,21.225 2016-07-01T11:05:15+02:00,21.233

Code to read from csv:从 csv 读取的代码：

In [1]: import pandas as pd

In [2]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True)

This results in an index that represents the timezone naive utc time:这会产生一个表示时区天真的 UTC 时间的索引：

In [3]: df.index

Out[3]: DatetimeIndex(['2016-07-01 09:05:07', '2016-07-01 09:05:09',
           '2016-07-01 09:05:10', '2016-07-01 09:05:11',
           '2016-07-01 09:05:12', '2016-07-01 09:05:13',
           '2016-07-01 09:05:14', '2016-07-01 09:05:15'],
          dtype='datetime64[ns]', name='DateTime', freq=None)

I tried to use a date_parser function:我尝试使用 date_parser 函数：

In [4]: date_parser = lambda x: pd.to_datetime(x).tz_localize(None)

In [5]: df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

This gave the same result.这给出了相同的结果。

How can I make read_csv create a DatetimeIndex that is timezone naive and represents the local time instead of the utc time ?我怎样才能让 read_csv 创建一个 DatetimeIndex ，它是时区天真并代表本地时间而不是utc 时间？

I'm using pandas 0.18.1.我正在使用熊猫 0.18.1。

Answer 1

According to the docs the default date_parser uses dateutil.parser.parser .根据文档，默认date_parser使用dateutil.parser.parser 。 According to the docs for that function , the default is to ignore timezones.根据该函数的文档，默认值是忽略时区。 So if you supply dateutil.parser.parser as the date_parser kwarg, timezones are not converted.因此，如果您提供dateutil.parser.parser作为date_parser kwarg，则不会转换时区。

import dateutil

df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=dateutil.parser.parse)

print(df)

outputs产出

                           Temperature
DateTime                              
2016-07-01 11:05:07+02:00       21.125
2016-07-01 11:05:09+02:00       21.138
2016-07-01 11:05:10+02:00       21.156
2016-07-01 11:05:11+02:00       21.179
2016-07-01 11:05:12+02:00       21.198
2016-07-01 11:05:13+02:00       21.206
2016-07-01 11:05:14+02:00       21.225
2016-07-01 11:05:15+02:00       21.233

Answer 2

The answer of Alex leads to a timezone aware DatetimeIndex. Alex 的回答导致了时区感知 DatetimeIndex。 To get a timezone naive local DatetimeIndex, as asked by the OP, inform dateutil.parser.parser to ignore the timezone information by setting ignoretz=True :要按照 OP 的要求获取时区天真本地DatetimeIndex，请通过设置ignoretz=True通知dateutil.parser.parser忽略时区信息：

import dateutil

date_parser = lambda x: dateutil.parser.parse(x, ignoretz=True)
df = pd.read_csv('Test.csv', index_col=0, parse_dates=True, date_parser=date_parser)

print(df)

outputs产出

                     Temperature
DateTime                        
2016-07-01 11:05:07       21.125
2016-07-01 11:05:09       21.138
2016-07-01 11:05:10       21.156
2016-07-01 11:05:11       21.179
2016-07-01 11:05:12       21.198
2016-07-01 11:05:13       21.206
2016-07-01 11:05:14       21.225
2016-07-01 11:05:15       21.233

Answer 3

I adopted the dateutil technique earlier today but have since switched to a faster alternative:我今天早些时候采用了dateutil技术，但后来改用了更快的替代方法：

date_parser = lambda ts: pd.to_datetime([s[:-5] for s in ts]))

Edit: s[:-5] is correct (screenshot has error)编辑： s[:-5]是正确的（截图有错误）

In the screenshot below, I import ~55MB of tab-separated files.在下面的屏幕截图中，我导入了约 55MB 的制表符分隔文件。 The dateutil method works, but takes orders of magnitude longer. dateutil方法有效，但需要更长的数量级。

This was using pandas 0.18.1 and dateutil 2.5.3.这是使用熊猫 0.18.1 和 dateutil 2.5.3。

EDIT This lambda function will work even if Z-0000 suffix is missing...编辑即使缺少Z-0000后缀，此 lambda 函数也能工作...

date_parser = lambda ts: pd.to_datetime([s[:-5] if 'Z' in s else s for s in ts])

Answer 4

你甚至可以尝试：

date_parser = lambda x : pd.to_datetime(x.str[:-6])

如何在pandas中使用read_csv读取时区感知日期时间作为时区天真本地DatetimeIndex？

问题描述

4 个解决方案

解决方案1
4 2016-07-22 17:14:08

解决方案2
4 已采纳 2016-07-25 10:13:40

解决方案3
1 2016-08-26 00:44:17

解决方案4
-1 2020-04-11 00:37:27

如何在pandas中使用read_csv读取时区感知日期时间作为时区天真本地DatetimeIndex？

问题描述

4 个解决方案

解决方案1 4 2016-07-22 17:14:08

解决方案2 4 已采纳 2016-07-25 10:13:40

解决方案3 1 2016-08-26 00:44:17

解决方案4 -1 2020-04-11 00:37:27

解决方案1
4 2016-07-22 17:14:08

解决方案2
4 已采纳 2016-07-25 10:13:40

解决方案3
1 2016-08-26 00:44:17

解决方案4
-1 2020-04-11 00:37:27