简体   繁体   English

从txt文件到数组的时间戳

[英]timestamp from a txt file into an array

I have a txt file with the following structure: 我有一个具有以下结构的txt文件:

"YYYY/MM/DD HH:MM:SS.SSS val1 val2 val3 val4 val5'

The first line look like: 第一行如下所示:

"2015/02/18 01:05:46.004   13.737306807  100.526088432   -22.2937   2   5"

I am having trouble to put the time stamp into the array. 我很难将时间戳记放入数组中。 The time values are used to compare data with same timestamp from different files, parse the data for a specific time interval, and plotting purposes. 时间值用于比较来自不同文件的具有相同时间戳的数据,解析特定时间间隔的数据以及作图。

This is what I have right now ... except the time information: 这就是我现在所拥有的……除了时间信息:

dt=np.dtype([('lat', float), ('lon', float), ('height', float), ('Q', int), ('ns', int)]
a=np.loadtxt('tmp.pos', dt)

Any suggestion how to extent the dt to include the date and the time columns? 有什么建议如何扩展dt以包括日期和时间列? or Is there a better way than using loadtext from numpy ? 还是有比使用numpy的 loadtext更好的方法?

An example of the file can be found here: https://www.dropbox.com/s/j69l8oeqdm73q8y/tmp.pos 可以在以下位置找到该文件的示例: https : //www.dropbox.com/s/j69l8oeqdm73q8y/tmp.pos

Edit 1 编辑1

It turns out that the numpy.loadtxt takes a parameter called converters that may does the job: 事实证明, numpy.loadtxt使用一个称为converters的参数,可以完成此工作:

a = np.loadtxt(fname='tmp.pos', converters={0: strpdate2num('%Y/%m/%d'), 1: strpdate2num('%H:%M:%S.%f')})

This means that the first two columns of a are 'date' and 'time' expressed as floats. 这意味着a的前两列是用浮点数表示的“日期”和“时间”。 To get back the time string, I can do something like this (though perhaps a bit clumsy): 要获取时间字符串,我可以执行以下操作(尽管可能有些笨拙):

In [441]: [datetime.strptime(num2date(a[i,0]).strftime('%Y-%m-%d')+num2date(a[i,1]).strftime('%H:%M:%S.%f'), '%Y-%m-%d%H:%M:%S.%f') for i in range(len(a[:,0]))]

which gives: 这使:

Out[441]: [datetime.datetime(2015, 2, 18, 1, 5, 46)]

However, the decimal part of the seconds are not preserved. 但是,秒的小数部分不会保留。 What I am doing wrong? 我做错了什么?

If this is coming from a text file, it may be simpler to parse this as text unless you want it all to end up in a numpy array. 如果这是来自文本文件,则将其解析为文本可能会更简单,除非您希望它们全部以numpy数组结尾。 For example: 例如:

>>> my_line = "2015/02/18 01:05:46.004   13.737306807  100.526088432   -22.2937   2   5"
>>> datestamp, timestamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
>>> datestamp
'2015/02/18'
>>> timestamp
'01:05:46.004'

So if you want to iterate over a file of these lines and obtain a native datetime object for each ine: 因此,如果要遍历这些行的文件并为每个ine获取本地日期时间对象:

from datetime import datetime
with open('path_to_file', 'r') as my_file:
    for line in my_file:
        d_stamp, t_stamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
        dt_obj = datetime.strptime(' '.join([d_stamp, t_stamp]), '%Y/%m/%d %H:%M:%S.%f')

Better convert the time string in to timeStamp and pass the value as integer format. 最好将时间字符串转换为timeStamp并将值作为整数格式传递。 Integer will speed up your comparisons as well. 整数也会加快您的比较。

import time
dt, ts = "2015/02/18 01:05:46.004".split()
year,mon,day = [int(d) for d in dt.split('/')]
hrs,mins,secs = [int(float(d)) for d in ts.split(':')]
timeStamp = time.mktime((year,mon,day,hrs,mins,secs,0,0,time.localtime()[8]))

Pandas is supposed to be good at this sort of thing. 熊猫应该擅长于这种事情。 I'm no expert and had some trouble with the parse_date functionality of read_csv but the following seems to work reasonably well and fast: 我不是专家,并遇到了一些麻烦与parse_date功能read_csv但以下似乎工作相当不错,速度快:

import pandas as pd

names = ('date', 'time', 'lat', 'lon', 'height', 'Q', 'ns')
format = '%Y/%m/%d%H:%M:%S.%f'
df = pd.read_csv('tmp.pos', delim_whitespace=True, names=names)
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format=format)

If you want to select data based on time stamps, you can set it as the index of the dataframe : 如果要基于时间戳选择数据,可以将其设置为数据框索引

df.index = pd.to_datetime(df['date'] + df['time'], format=format)
print df['2015-02-18 2:30:00':'2015-02-18 2:30:10']

You can also set the time column as the index, but it seems directly slicing with only time is not supported: 您也可以将time列设置为索引,但是似乎不支持仅使用时间直接切片:

format = '%H:%M:%S.%f'
df.index = pd.to_datetime(df['time'], format=format)
print df['2:30:00':'2:30:10']  # prints empty DataFrame

But you can use the following : 但是您可以使用以下命令

print df.between_time('2:30:00','2:30:10')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM