简体   繁体   English

使用熊猫从csv读取日期时间时出错

[英]error reading date time from csv using pandas

I am using Pandas to read and process csv file. 我正在使用Pandas读取和处理csv文件。 My csv file have date/time column that looks like: 我的csv文件的日期/时间列如下所示:

11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST

How can I read this using python/pandas, so far what I have is this: 我如何使用python / pandas读取此信息,到目前为止,我所拥有的是:

pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')

But this is not working, though previously I was able to use the same code for another format (with a different format specifier). 但这是行不通的,尽管以前我能够将相同的代码用于另一种格式(具有不同的格式说明符)。 Any suggestions? 有什么建议么?

The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. 您遇到的问题很可能是因为3.2之前的Python版本(我认为吗?)在时区方面存在很多麻烦,因此您的格式字符串可能会在%z和%Z部分上搞砸了。 For example, in Python 2.7: 例如,在Python 2.7中:

In [187]: import datetime

In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')

ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'

You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. 您使用的是pd.to_datetime而不是datetime.datetime.strptime,但是潜在的问题是相同的,您可以参考此线程以获取帮助。 What I would suggest is instead of using pd.to_datetime, do something like 我建议不要使用pd.to_datetime,而是执行类似

In [191]: import dateutil

In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))

It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." 最简单的方法是最后截断时区(因为有偏移量,所以是多余的),然后将“:”更改为“”。 between the seconds and microseconds. 在秒和微秒之间。

Since datetime.timezone has become available in Python 3.2 , you can use %z with .strptime() ( see docs ). 由于datetime.timezonePython 3.2已经可用,因此您可以将%z.strptime()请参阅docs )。 Starting with: 从...开始:

dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)

to get: 要得到:

                           time_col
0  2015-10-02 11:59:50.322000-04:00
1  2015-10-16 11:11:55.051000-04:00
2  2015-11-02 00:38:37.106000-05:00
3  2015-11-14 04:15:51.600000-05:00
4  2015-11-14 04:15:51.600000-05:00
5  2015-11-28 13:43:28.540000-05:00
6  2015-12-14 09:24:12.723000-05:00
7  2015-12-28 13:28:12.346000-05:00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM