简体   繁体   English

如何将 .txt 数据文件中的值解释为时间序列

[英]How to interpret values in a .txt data file as a time series

I have a data file that has values in it like this:我有一个数据文件,其中包含如下值:

@ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS @ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS

29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86 29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87 29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87

I use the following to get the data in:我使用以下方法获取数据:

infile = open ("testfile.txt", 'r')
data = np.genfromtxt(infile,skiprows=2) 

which gives me a numpy.ndarray这给了我一个 numpy.ndarray

I want to be able to interpret the first 0-5 columns as a timestamp (DD:MM:YYY:HH:MN:SS), but this is where I get stumped - there seems to be a million ways to do it and I don't know what's best.我希望能够将前 0-5 列解释为时间戳 (DD:MM:YYY:HH:MN:SS),但这就是我被难住的地方 - 似乎有一百万种方法可以做到这一点,我不知道什么是最好的。

I've been looking at dateutil and pandas - I know there is something blindingly obvious I should do, but am at a loss.我一直在研究 dateutil 和 pandas - 我知道我应该做一些非常明显的事情,但我不知所措。 Should I convert to a csv format first?我应该先转换为 csv 格式吗? Somehow concatenate the values from each row (cols 0-5) using a for loop?以某种方式使用 for 循环连接每一行(cols 0-5)的值?

After this I'll plot values from other columns against the timestamps/deltas.在此之后,我将根据时间戳/增量绘制来自其他列的值。

I'm totally new to python, so any pointers appreciated :)我对 python 完全陌生,所以任何指针都表示赞赏:)

Here's a pandas solution for you:这是一个适合您的pandas解决方案:

test.csv:测试.csv:

29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87

pandas provide a read_csv util for reading the csv, you should give the following parameters to parse your file: pandas提供了一个read_csv 实用程序来读取 csv,您应该提供以下参数来解析您的文件:

  1. delimiter: the default one is comma, so you need to set it as a space分隔符:默认为逗号,需要设置为空格
  2. parse_dates: those date columns (order sensitive) parse_dates:那些日期列(顺序敏感)
  3. date_parser: the default is dateutil.parser.parse , but seems it doesn't work for your case, so you should implement your own parser date_parser:默认为dateutil.parser.parse ,但似乎不适用于您的情况,因此您应该实现自己的解析器
  4. header: if your csv doesn't have the column name, you should set it as None标题:如果您的 csv 没有列名,则应将其设置为None

Finally, here the sample code:最后,这里的示例代码:

In [131]: import datetime as dt

In [132]: import pandas as pd

In [133]: pd.read_csv('test.csv', 
                       parse_dates=[[2,1,0,3,4,5]], 
                       date_parser=lambda *arr:dt.datetime(*[int(x) for x in arr]),
                       delimiter=' ', 
                       header=None)
Out[133]:
          2_1_0_3_4_5     6     7     8     9     10    11     12    13    14  \
0 2000-11-29 13:17:56  2.44  1.71  3.12  9.12  11.94  5.03  12.74  0.83  8.95
1 2000-11-29 13:31:16  2.43  1.74  4.16  9.17  11.30  4.96  11.70  0.84  8.84

      15   16    17
0  15.03  1.8  0.86
1  11.86  1.8  0.87

This is how I would do it:这就是我将如何做到的:

from datetime import datetime

# assuming you have a row of the data in a list like this
# (also works on ndarrays in numpy, but you need to keep track of the row, 
#  so let's assume you've extracted a row like the one below...)
rowData = [29, 11, 2000, 13, 17, 56, 2.44, 1.71, 3.12, 9.12, 11.94, 5.03, 12.74, 0.83, 8.95, 15.03, 1.8, 0.86] 

# unpack the first six values
day, month, year, hour, min, sec = rowData[:6] 
# create a datetime based on the unpacked values
theDate = datetime(year,month,day,hour,min,sec)

No need to convert the data to a string and parse that.无需将数据转换为字符串并对其进行解析。 Might be good to check out thedatetime documentation .查看日期时间文档可能会很好。

I barely know anything about numpy, but you can use the datetime module to convert the dates into a date object:我对 numpy 几乎一无所知,但是您可以使用datetime模块将日期转换为日期对象:

import datetime
line = "29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86"
times = line.split()[:6]

Now from here you have two options:现在从这里你有两个选择:

print ':'.join(times)
# 29:11:2000:13:17:56

Or, as I said before, use the datetime module:或者,正如我之前所说,使用 datetime 模块:

mydate = datetime.datetime.strptime(':'.join(times), '%d:%m:%Y:%H:%M:%S')
print datetime.datetime.strftime(mydate, '%d:%m:%Y:%H:%M:%S')
# 29:11:2000:13:17:56

Of course, you're probably thinking that the second option is useless, but if you want more information from the dates (ie like the year), then it's probably better to convert it to a datetime object.当然,您可能认为第二个选项没有用,但是如果您想从日期(即年份)中获得更多信息,那么最好将其转换为日期时间对象。

import datetime
import re

import numpy as np

def convert_to_datetime(x):
    return datetime.datetime.strptime(x, '%d:%m:%Y:%H:%M:%S')

infile = open("testfile.txt", 'r')
infile = (re.sub(r'^(\d+) (\d+) (\d+) (\d+) (\d+) (\d+)', r'\1:\2:\3:\4:\5:\6', line, 1) for line in infile)
data = np.genfromtxt(infile, skiprows=2, converters={0: convert_to_datetime})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从python中的.txt文件中的时间序列数据创建可视化 - How to create visualization from time series data in a .txt file in python 如何在时间序列数据中引入缺失值 - How to introduce missing values in time series data 读取 .txt 文件以从年份行和每月值列中获取时间序列 - Reading in a .txt file to get time series from rows of years and columns of monthly values 使用 Numpy Array 从 .txt 文件中迭代时间序列数据 - Iterate through Time Series data from .txt file using Numpy Array 如何汇总一个Pandas Dataframe中时间序列数据的缺失值? - How to summarize missing values in time series data in a Pandas Dataframe? 如何在python数据框中按历史时间序列值汇总数据? - How to aggregate data by historical time series values in python dataframe? 如何解释MIDI文件分析的参数值,尤其是“ midi.NoteOnEvent”的“ data”字段? - How to interpret values of parameters of MIDI file analysis especially the “data” field of “midi.NoteOnEvent”? 如何将 a.txt 文件分离为用于绘图的数据值? - How can I separate a .txt file into data values for graphing? 检查时间序列数据中是否缺少值 - Check Time Series Data for Missing Values 在 pandas 中透视具有多个值的时间序列数据 - Pivotting time series data with multiple values in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM