[英]How to interpret values in a .txt data file as a time series
I have a data file that has values in it like this:我有一个数据文件,其中包含如下值:
@ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS
@ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS
29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .8729 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87
I use the following to get the data in:我使用以下方法获取数据:
infile = open ("testfile.txt", 'r')
data = np.genfromtxt(infile,skiprows=2)
which gives me a numpy.ndarray这给了我一个 numpy.ndarray
I want to be able to interpret the first 0-5 columns as a timestamp (DD:MM:YYY:HH:MN:SS), but this is where I get stumped - there seems to be a million ways to do it and I don't know what's best.我希望能够将前 0-5 列解释为时间戳 (DD:MM:YYY:HH:MN:SS),但这就是我被难住的地方 - 似乎有一百万种方法可以做到这一点,我不知道什么是最好的。
I've been looking at dateutil and pandas - I know there is something blindingly obvious I should do, but am at a loss.我一直在研究 dateutil 和 pandas - 我知道我应该做一些非常明显的事情,但我不知所措。 Should I convert to a csv format first?
我应该先转换为 csv 格式吗? Somehow concatenate the values from each row (cols 0-5) using a for loop?
以某种方式使用 for 循环连接每一行(cols 0-5)的值?
After this I'll plot values from other columns against the timestamps/deltas.在此之后,我将根据时间戳/增量绘制来自其他列的值。
I'm totally new to python, so any pointers appreciated :)我对 python 完全陌生,所以任何指针都表示赞赏:)
Here's a pandas
solution for you:这是一个适合您的
pandas
解决方案:
test.csv:测试.csv:
29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87
pandas
provide a read_csv util for reading the csv, you should give the following parameters to parse your file: pandas
提供了一个read_csv 实用程序来读取 csv,您应该提供以下参数来解析您的文件:
dateutil.parser.parse
, but seems it doesn't work for your case, so you should implement your own parser dateutil.parser.parse
,但似乎不适用于您的情况,因此您应该实现自己的解析器None
None
Finally, here the sample code:最后,这里的示例代码:
In [131]: import datetime as dt
In [132]: import pandas as pd
In [133]: pd.read_csv('test.csv',
parse_dates=[[2,1,0,3,4,5]],
date_parser=lambda *arr:dt.datetime(*[int(x) for x in arr]),
delimiter=' ',
header=None)
Out[133]:
2_1_0_3_4_5 6 7 8 9 10 11 12 13 14 \
0 2000-11-29 13:17:56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 0.83 8.95
1 2000-11-29 13:31:16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 0.84 8.84
15 16 17
0 15.03 1.8 0.86
1 11.86 1.8 0.87
This is how I would do it:这就是我将如何做到的:
from datetime import datetime
# assuming you have a row of the data in a list like this
# (also works on ndarrays in numpy, but you need to keep track of the row,
# so let's assume you've extracted a row like the one below...)
rowData = [29, 11, 2000, 13, 17, 56, 2.44, 1.71, 3.12, 9.12, 11.94, 5.03, 12.74, 0.83, 8.95, 15.03, 1.8, 0.86]
# unpack the first six values
day, month, year, hour, min, sec = rowData[:6]
# create a datetime based on the unpacked values
theDate = datetime(year,month,day,hour,min,sec)
No need to convert the data to a string and parse that.无需将数据转换为字符串并对其进行解析。 Might be good to check out thedatetime documentation .
查看日期时间文档可能会很好。
I barely know anything about numpy, but you can use the datetime
module to convert the dates into a date object:我对 numpy 几乎一无所知,但是您可以使用
datetime
模块将日期转换为日期对象:
import datetime
line = "29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86"
times = line.split()[:6]
Now from here you have two options:现在从这里你有两个选择:
print ':'.join(times)
# 29:11:2000:13:17:56
Or, as I said before, use the datetime module:或者,正如我之前所说,使用 datetime 模块:
mydate = datetime.datetime.strptime(':'.join(times), '%d:%m:%Y:%H:%M:%S')
print datetime.datetime.strftime(mydate, '%d:%m:%Y:%H:%M:%S')
# 29:11:2000:13:17:56
Of course, you're probably thinking that the second option is useless, but if you want more information from the dates (ie like the year), then it's probably better to convert it to a datetime object.当然,您可能认为第二个选项没有用,但是如果您想从日期(即年份)中获得更多信息,那么最好将其转换为日期时间对象。
import datetime
import re
import numpy as np
def convert_to_datetime(x):
return datetime.datetime.strptime(x, '%d:%m:%Y:%H:%M:%S')
infile = open("testfile.txt", 'r')
infile = (re.sub(r'^(\d+) (\d+) (\d+) (\d+) (\d+) (\d+)', r'\1:\2:\3:\4:\5:\6', line, 1) for line in infile)
data = np.genfromtxt(infile, skiprows=2, converters={0: convert_to_datetime})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.