使用熊猫从csv读取日期时间时出错

Question

I am using Pandas to read and process csv file. 我正在使用Pandas读取和处理csv文件。 My csv file have date/time column that looks like: 我的csv文件的日期/时间列如下所示：

11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST

How can I read this using python/pandas, so far what I have is this: 我如何使用python / pandas读取此信息，到目前为止，我所拥有的是：

pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')

But this is not working, though previously I was able to use the same code for another format (with a different format specifier). 但这是行不通的，尽管以前我能够将相同的代码用于另一种格式（具有不同的格式说明符）。 Any suggestions? 有什么建议么？

Answer 1

The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. 您遇到的问题很可能是因为3.2之前的Python版本（我认为吗？）在时区方面存在很多麻烦，因此您的格式字符串可能会在％z和％Z部分上搞砸了。 For example, in Python 2.7: 例如，在Python 2.7中：

In [187]: import datetime

In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')

ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'

You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. 您使用的是pd.to_datetime而不是datetime.datetime.strptime，但是潜在的问题是相同的，您可以参考此线程以获取帮助。 What I would suggest is instead of using pd.to_datetime, do something like 我建议不要使用pd.to_datetime，而是执行类似

In [191]: import dateutil

In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))

It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." 最简单的方法是最后截断时区（因为有偏移量，所以是多余的），然后将“：”更改为“”。 between the seconds and microseconds. 在秒和微秒之间。

Answer 2

Since datetime.timezone has become available in Python 3.2 , you can use %z with .strptime() ( see docs ). 由于datetime.timezone在Python 3.2已经可用，因此您可以将%z与.strptime() （请参阅docs ）。 Starting with: 从...开始：

dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)

to get: 要得到：

                           time_col
0  2015-10-02 11:59:50.322000-04:00
1  2015-10-16 11:11:55.051000-04:00
2  2015-11-02 00:38:37.106000-05:00
3  2015-11-14 04:15:51.600000-05:00
4  2015-11-14 04:15:51.600000-05:00
5  2015-11-28 13:43:28.540000-05:00
6  2015-12-14 09:24:12.723000-05:00
7  2015-12-28 13:28:12.346000-05:00

使用熊猫从csv读取日期时间时出错

问题描述

2 个解决方案

解决方案1
1 2016-01-12 05:10:41

解决方案2
0 2016-01-12 04:57:57

使用熊猫从csv读取日期时间时出错

问题描述

2 个解决方案

解决方案1 1 2016-01-12 05:10:41

解决方案2 0 2016-01-12 04:57:57

解决方案1
1 2016-01-12 05:10:41

解决方案2
0 2016-01-12 04:57:57