简体   繁体   中英

Importing csv into Numpy datetime64

I am trying out the latest version of numpy 2.0 dev:

np.__version__
Out[44]: '2.0.0.dev-aded70c'

I am trying to import CSV data that looks like this:

date,system,pumping,rgt,agt,sps,eskom_import,temperature,wind,pressure,weather
2007-01-01 00:30,481.9,,,,,481.9,15,SW,1040,Fine
2007-01-01 01:00,471.9,,,,,471.9,15,SW,1040,Fine
2007-01-01 01:30,455.9,,,,,455.9,,,,

etc.

by using the following code:

convertdict = {0: lambda s: np.datetime64(s, 'm'), 1: lambda s: float(s or 0), 2: lambda s: float(s or 0), 3: lambda s: float(s or 0), 4: lambda s: float(s or 0), 5: lambda s: float(s or 0), 6: lambda s: float(s or 0), 7: lambda s: float(s or 0), 8: str, 9: str, 10: str}

dt = [('date', np.datetime64),('system', float), ('pumping', float),('rgt', 
float), ('agt', float), ('sps', float) ,('eskom_import', float),('temperature', float), ('wind', str), ('pressure', float), ('weather', str)]

a = np.recfromcsv(fp, dtype=dt, converters=convertdict, usecols=range(0-11), 
names=True)         

The dtype it generates for a.date is 'object':

array([2007-01-01T00:30+0200, 2007-01-01T01:00+0200, 2007-01-01T01:30+0200,
       ..., 2007-12-31T23:00+0200, 2007-12-31T23:30+0200,
       2008-01-01T00:00+0200], dtype=object)

But I need it to be datetime64, like in this example (but including hrs and minutes):

array(['2011-07-11', '2011-07-12', '2011-07-13', '2011-07-14',
       '2011-07-15', '2011-07-16', '2011-07-17'], dtype='datetime64[D]')

It seems that the CSV import creates an embedded object datetype for 'date' rather than a datetime64 data type. Any ideas on how to fix this?

Grové

I think the trick to avoid the generic 'object' type is to avoid using the recfromcsv function. Manually reading in your data file and parsing the information yields the requested dtype='datetime64[m]'

import numpy as np
dt = np.dtype([ ('date',        '<M8[m]'), 
                ('system',      '<f8'), 
                ('pumping',     '<f8'), 
                ('rgt',         '<f8'), 
                ('agt',         '<f8'), 
                ('sps',         '<f8'), 
                ('eskom_import','<f8'), 
                ('temperature', '<f8'), 
                ('wind',        np.str), 
                ('pressure',    '<f8'), 
                ('weather',     np.str) ])
numfields = len(dt.fields.keys())
data = np.zeros(numlines, dtype=dt)         
fid = open('data.csv', 'rb')
count = 0
try:
    fieldnames = fid.readline().strip().split(',') #Header
    for line in fid:
        parsedline = line.strip().split(',')
        data['date'][count]         = np.datetime64(parsedline[0], 'm')
        data['system'][count]       = np.double(parsedline[1])
        data['pumping'][count]      = np.double(parsedline[2])
        data['rgt'][count]          = np.double(parsedline[3])
        data['agt'][count]          = np.double(parsedline[4])
        data['sps'][count]          = np.double(parsedline[5])
        data['eskom_import'][count] = np.double(parsedline[6])
        data['temperature'][count]  = np.double(parsedline[7])
        data['wind'][count]         = np.str(parsedline[8])
        data['pressure'][count]     = np.double(parsedline[9])
        data['weather'][count]      = np.str(parsedline[10])
        count += 1
 finally:
     fid.close()

>>> data['date']
array(['2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
       '2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
       '2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
       '2007-01-01T00:30-0500', '2007-01-01T01:00-0500'], dtype='datetime64[m]')

You could definitely improve upon this code by making use of your "convertdict" and iterating over the parsedline but the idea is the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM