如何将 .txt 数据文件中的值解释为时间序列

Question

I have a data file that has values in it like this:我有一个数据文件，其中包含如下值：

@ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS @ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS

29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86 29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87 29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87

I use the following to get the data in:我使用以下方法获取数据：

infile = open ("testfile.txt", 'r')
data = np.genfromtxt(infile,skiprows=2)

which gives me a numpy.ndarray这给了我一个 numpy.ndarray

I want to be able to interpret the first 0-5 columns as a timestamp (DD:MM:YYY:HH:MN:SS), but this is where I get stumped - there seems to be a million ways to do it and I don't know what's best.我希望能够将前 0-5 列解释为时间戳 (DD:MM:YYY:HH:MN:SS)，但这就是我被难住的地方 - 似乎有一百万种方法可以做到这一点，我不知道什么是最好的。

I've been looking at dateutil and pandas - I know there is something blindingly obvious I should do, but am at a loss.我一直在研究 dateutil 和 pandas - 我知道我应该做一些非常明显的事情，但我不知所措。 Should I convert to a csv format first?我应该先转换为 csv 格式吗？ Somehow concatenate the values from each row (cols 0-5) using a for loop?以某种方式使用 for 循环连接每一行（cols 0-5）的值？

After this I'll plot values from other columns against the timestamps/deltas.在此之后，我将根据时间戳/增量绘制来自其他列的值。

I'm totally new to python, so any pointers appreciated :)我对 python 完全陌生，所以任何指针都表示赞赏:)

Answer 1

Here's a pandas solution for you:这是一个适合您的pandas解决方案：

test.csv:测试.csv：

29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87

pandas provide a read_csv util for reading the csv, you should give the following parameters to parse your file: pandas提供了一个read_csv 实用程序来读取 csv，您应该提供以下参数来解析您的文件：

delimiter: the default one is comma, so you need to set it as a space分隔符：默认为逗号，需要设置为空格
parse_dates: those date columns (order sensitive) parse_dates：那些日期列（顺序敏感）
date_parser: the default is dateutil.parser.parse , but seems it doesn't work for your case, so you should implement your own parser date_parser：默认为dateutil.parser.parse ，但似乎不适用于您的情况，因此您应该实现自己的解析器
header: if your csv doesn't have the column name, you should set it as None标题：如果您的 csv 没有列名，则应将其设置为None

Finally, here the sample code:最后，这里的示例代码：

In [131]: import datetime as dt

In [132]: import pandas as pd

In [133]: pd.read_csv('test.csv', 
                       parse_dates=[[2,1,0,3,4,5]], 
                       date_parser=lambda *arr:dt.datetime(*[int(x) for x in arr]),
                       delimiter=' ', 
                       header=None)
Out[133]:
          2_1_0_3_4_5     6     7     8     9     10    11     12    13    14  \
0 2000-11-29 13:17:56  2.44  1.71  3.12  9.12  11.94  5.03  12.74  0.83  8.95
1 2000-11-29 13:31:16  2.43  1.74  4.16  9.17  11.30  4.96  11.70  0.84  8.84

      15   16    17
0  15.03  1.8  0.86
1  11.86  1.8  0.87

Answer 2

This is how I would do it:这就是我将如何做到的：

from datetime import datetime

# assuming you have a row of the data in a list like this
# (also works on ndarrays in numpy, but you need to keep track of the row, 
#  so let's assume you've extracted a row like the one below...)
rowData = [29, 11, 2000, 13, 17, 56, 2.44, 1.71, 3.12, 9.12, 11.94, 5.03, 12.74, 0.83, 8.95, 15.03, 1.8, 0.86] 

# unpack the first six values
day, month, year, hour, min, sec = rowData[:6] 
# create a datetime based on the unpacked values
theDate = datetime(year,month,day,hour,min,sec)

No need to convert the data to a string and parse that.无需将数据转换为字符串并对其进行解析。 Might be good to check out thedatetime documentation .查看日期时间文档可能会很好。

Answer 3

I barely know anything about numpy, but you can use the datetime module to convert the dates into a date object:我对 numpy 几乎一无所知，但是您可以使用datetime模块将日期转换为日期对象：

import datetime
line = "29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86"
times = line.split()[:6]

Now from here you have two options:现在从这里你有两个选择：

print ':'.join(times)
# 29:11:2000:13:17:56

Or, as I said before, use the datetime module:或者，正如我之前所说，使用 datetime 模块：

mydate = datetime.datetime.strptime(':'.join(times), '%d:%m:%Y:%H:%M:%S')
print datetime.datetime.strftime(mydate, '%d:%m:%Y:%H:%M:%S')
# 29:11:2000:13:17:56

Of course, you're probably thinking that the second option is useless, but if you want more information from the dates (ie like the year), then it's probably better to convert it to a datetime object.当然，您可能认为第二个选项没有用，但是如果您想从日期（即年份）中获得更多信息，那么最好将其转换为日期时间对象。

Answer 4

import datetime
import re

import numpy as np

def convert_to_datetime(x):
    return datetime.datetime.strptime(x, '%d:%m:%Y:%H:%M:%S')

infile = open("testfile.txt", 'r')
infile = (re.sub(r'^(\d+) (\d+) (\d+) (\d+) (\d+) (\d+)', r'\1:\2:\3:\4:\5:\6', line, 1) for line in infile)
data = np.genfromtxt(infile, skiprows=2, converters={0: convert_to_datetime})

如何将 .txt 数据文件中的值解释为时间序列

问题描述

4 个解决方案

解决方案1
2 已采纳 2013-06-26 06:44:08

解决方案2
1 2013-06-26 06:33:59

解决方案3
0 2013-06-26 06:29:26

解决方案4
0 2013-06-26 07:40:07

如何将 .txt 数据文件中的值解释为时间序列

问题描述

4 个解决方案

解决方案1 2 已采纳 2013-06-26 06:44:08

解决方案2 1 2013-06-26 06:33:59

解决方案3 0 2013-06-26 06:29:26

解决方案4 0 2013-06-26 07:40:07

解决方案1
2 已采纳 2013-06-26 06:44:08

解决方案2
1 2013-06-26 06:33:59

解决方案3
0 2013-06-26 06:29:26

解决方案4
0 2013-06-26 07:40:07