I have a txt file with the following structure:
"YYYY/MM/DD HH:MM:SS.SSS val1 val2 val3 val4 val5'
The first line look like:
"2015/02/18 01:05:46.004 13.737306807 100.526088432 -22.2937 2 5"
I am having trouble to put the time stamp into the array. The time values are used to compare data with same timestamp from different files, parse the data for a specific time interval, and plotting purposes.
This is what I have right now ... except the time information:
dt=np.dtype([('lat', float), ('lon', float), ('height', float), ('Q', int), ('ns', int)]
a=np.loadtxt('tmp.pos', dt)
Any suggestion how to extent the dt to include the date and the time columns? or Is there a better way than using loadtext from numpy ?
An example of the file can be found here: https://www.dropbox.com/s/j69l8oeqdm73q8y/tmp.pos
Edit 1
It turns out that the numpy.loadtxt takes a parameter called converters that may does the job:
a = np.loadtxt(fname='tmp.pos', converters={0: strpdate2num('%Y/%m/%d'), 1: strpdate2num('%H:%M:%S.%f')})
This means that the first two columns of a are 'date' and 'time' expressed as floats. To get back the time string, I can do something like this (though perhaps a bit clumsy):
In [441]: [datetime.strptime(num2date(a[i,0]).strftime('%Y-%m-%d')+num2date(a[i,1]).strftime('%H:%M:%S.%f'), '%Y-%m-%d%H:%M:%S.%f') for i in range(len(a[:,0]))]
which gives:
Out[441]: [datetime.datetime(2015, 2, 18, 1, 5, 46)]
However, the decimal part of the seconds are not preserved. What I am doing wrong?
If this is coming from a text file, it may be simpler to parse this as text unless you want it all to end up in a numpy array. For example:
>>> my_line = "2015/02/18 01:05:46.004 13.737306807 100.526088432 -22.2937 2 5"
>>> datestamp, timestamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
>>> datestamp
'2015/02/18'
>>> timestamp
'01:05:46.004'
So if you want to iterate over a file of these lines and obtain a native datetime object for each ine:
from datetime import datetime
with open('path_to_file', 'r') as my_file:
for line in my_file:
d_stamp, t_stamp, val1, val2, val3, val4, val5 = [v.strip() for v in my_line.split()]
dt_obj = datetime.strptime(' '.join([d_stamp, t_stamp]), '%Y/%m/%d %H:%M:%S.%f')
Better convert the time string in to timeStamp and pass the value as integer format. Integer will speed up your comparisons as well.
import time
dt, ts = "2015/02/18 01:05:46.004".split()
year,mon,day = [int(d) for d in dt.split('/')]
hrs,mins,secs = [int(float(d)) for d in ts.split(':')]
timeStamp = time.mktime((year,mon,day,hrs,mins,secs,0,0,time.localtime()[8]))
Pandas is supposed to be good at this sort of thing. I'm no expert and had some trouble with the parse_date
functionality of read_csv
but the following seems to work reasonably well and fast:
import pandas as pd
names = ('date', 'time', 'lat', 'lon', 'height', 'Q', 'ns')
format = '%Y/%m/%d%H:%M:%S.%f'
df = pd.read_csv('tmp.pos', delim_whitespace=True, names=names)
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format=format)
If you want to select data based on time stamps, you can set it as the index of the dataframe :
df.index = pd.to_datetime(df['date'] + df['time'], format=format)
print df['2015-02-18 2:30:00':'2015-02-18 2:30:10']
You can also set the time column as the index, but it seems directly slicing with only time is not supported:
format = '%H:%M:%S.%f'
df.index = pd.to_datetime(df['time'], format=format)
print df['2:30:00':'2:30:10'] # prints empty DataFrame
But you can use the following :
print df.between_time('2:30:00','2:30:10')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.