[英]error reading date time from csv using pandas
I am using Pandas to read and process csv file. 我正在使用Pandas读取和处理csv文件。 My csv file have date/time column that looks like:
我的csv文件的日期/时间列如下所示:
11:59:50:322 02 10 2015 -0400 EDT
11:11:55:051 16 10 2015 -0400 EDT
00:38:37:106 02 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
04:15:51:600 14 11 2015 -0500 EST
13:43:28:540 28 11 2015 -0500 EST
09:24:12:723 14 12 2015 -0500 EST
13:28:12:346 28 12 2015 -0500 EST
How can I read this using python/pandas, so far what I have is this: 我如何使用python / pandas读取此信息,到目前为止,我所拥有的是:
pd.to_datetime(pd.Series(df['senseStartTime']),format='%H:%M:%S:%f %d %m %Y %z %Z')
But this is not working, though previously I was able to use the same code for another format (with a different format specifier). 但这是行不通的,尽管以前我能够将相同的代码用于另一种格式(具有不同的格式说明符)。 Any suggestions?
有什么建议么?
The issue you're having is likely because versions of Python before 3.2 (I think?) had a lot of trouble with time zones, so your format string might be screwing up on the %z and %Z parts. 您遇到的问题很可能是因为3.2之前的Python版本(我认为吗?)在时区方面存在很多麻烦,因此您的格式字符串可能会在%z和%Z部分上搞砸了。 For example, in Python 2.7:
例如,在Python 2.7中:
In [187]: import datetime
In [188]: datetime.datetime.strptime('11:59:50:322 02 10 2015 -0400 EDT', '%H:%M:%S:%f %d %m %Y %z %Z')
ValueError: 'z' is a bad directive in format '%H:%M:%S:%f %d %m %Y %z %Z'
You're using pd.to_datetime instead of datetime.datetime.strptime but the underlying issues are the same, you can refer to this thread for help. 您使用的是pd.to_datetime而不是datetime.datetime.strptime,但是潜在的问题是相同的,您可以参考此线程以获取帮助。 What I would suggest is instead of using pd.to_datetime, do something like
我建议不要使用pd.to_datetime,而是执行类似
In [191]: import dateutil
In [192]: dateutil.parser.parse('11:59:50.322 02 10 2015 -0400')
Out[192]: datetime.datetime(2015, 2, 10, 11, 59, 50, 322000, tzinfo=tzoffset(None, -14400))
It should be pretty simple to chop off the timezone at the end (which is redundant since you have the offset), and change the ":" to "." 最简单的方法是最后截断时区(因为有偏移量,所以是多余的),然后将“:”更改为“”。 between the seconds and microseconds.
在秒和微秒之间。
Since datetime.timezone
has become available in Python 3.2
, you can use %z
with .strptime()
( see docs ). 由于
datetime.timezone
在Python 3.2
已经可用,因此您可以将%z
与.strptime()
( 请参阅docs )。 Starting with: 从...开始:
dateparse = lambda x: pd.datetime.strptime(x, '%H:%M:%S:%f %d %m %Y %z %Z')
df = pd.read_csv(path, parse_dates=['time_col'], date_parser=dateparse)
to get: 要得到:
time_col
0 2015-10-02 11:59:50.322000-04:00
1 2015-10-16 11:11:55.051000-04:00
2 2015-11-02 00:38:37.106000-05:00
3 2015-11-14 04:15:51.600000-05:00
4 2015-11-14 04:15:51.600000-05:00
5 2015-11-28 13:43:28.540000-05:00
6 2015-12-14 09:24:12.723000-05:00
7 2015-12-28 13:28:12.346000-05:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.