简体   繁体   中英

timestamp from a txt file into an array

I have a txt file with the following structure:

"YYYY/MM/DD HH:MM:SS.SSS val1 val2 val3 val4 val5'

The first line look like:

"2015/02/18 01:05:46.004   13.737306807  100.526088432   -22.2937   2   5"

I am having trouble to put the time stamp into the array. The time values are used to compare data with same timestamp from different files, parse the data for a specific time interval, and plotting purposes.

This is what I have right now ... except the time information:

dt=np.dtype([('lat', float), ('lon', float), ('height', float), ('Q', int), ('ns', int)]
a=np.loadtxt('tmp.pos', dt)

Any suggestion how to extent the dt to include the date and the time columns? or Is there a better way than using loadtext from numpy ?

An example of the file can be found here: https://www.dropbox.com/s/j69l8oeqdm73q8y/tmp.pos

Edit 1

It turns out that the numpy.loadtxt takes a parameter called converters that may does the job:

a = np.loadtxt(fname='tmp.pos', converters={0: strpdate2num('%Y/%m/%d'), 1: strpdate2num('%H:%M:%S.%f')})

This means that the first two columns of a are 'date' and 'time' expressed as floats. To get back the time string, I can do something like this (though perhaps a bit clumsy):

In [441]: [datetime.strptime(num2date(a[i,0]).strftime('%Y-%m-%d')+num2date(a[i,1]).strftime('%H:%M:%S.%f'), '%Y-%m-%d%H:%M:%S.%f') for i in range(len(a[:,0]))]

which gives:

Out[441]: [datetime.datetime(2015, 2, 18, 1, 5, 46)]

However, the decimal part of the seconds are not preserved. What I am doing wrong?

If this is coming from a text file, it may be simpler to parse this as text unless you want it all to end up in a numpy array. For example:

>>> my_line = "2015/02/18 01:05:46.004   13.737306807  100.526088432   -22.2937   2   5"
>>> datestamp, timestamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
>>> datestamp
'2015/02/18'
>>> timestamp
'01:05:46.004'

So if you want to iterate over a file of these lines and obtain a native datetime object for each ine:

from datetime import datetime
with open('path_to_file', 'r') as my_file:
    for line in my_file:
        d_stamp, t_stamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
        dt_obj = datetime.strptime(' '.join([d_stamp, t_stamp]), '%Y/%m/%d %H:%M:%S.%f')

Better convert the time string in to timeStamp and pass the value as integer format. Integer will speed up your comparisons as well.

import time
dt, ts = "2015/02/18 01:05:46.004".split()
year,mon,day = [int(d) for d in dt.split('/')]
hrs,mins,secs = [int(float(d)) for d in ts.split(':')]
timeStamp = time.mktime((year,mon,day,hrs,mins,secs,0,0,time.localtime()[8]))

Pandas is supposed to be good at this sort of thing. I'm no expert and had some trouble with the parse_date functionality of read_csv but the following seems to work reasonably well and fast:

import pandas as pd

names = ('date', 'time', 'lat', 'lon', 'height', 'Q', 'ns')
format = '%Y/%m/%d%H:%M:%S.%f'
df = pd.read_csv('tmp.pos', delim_whitespace=True, names=names)
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format=format)

If you want to select data based on time stamps, you can set it as the index of the dataframe :

df.index = pd.to_datetime(df['date'] + df['time'], format=format)
print df['2015-02-18 2:30:00':'2015-02-18 2:30:10']

You can also set the time column as the index, but it seems directly slicing with only time is not supported:

format = '%H:%M:%S.%f'
df.index = pd.to_datetime(df['time'], format=format)
print df['2:30:00':'2:30:10']  # prints empty DataFrame

But you can use the following :

print df.between_time('2:30:00','2:30:10')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM