简体   繁体   中英

Python, parsing data 24 hours at a time out of 263 days

I have an excel/( to be converted to CSV a link ) file. The data- , has 8 columns. The first two are day of the year and time respectively while two before the last are minimum temperature and maximum temperature. For each day I need to find the maximum and minimum of the day subtract and save the value for that day.

Two problems I ran into, how do I parse 24 lines at a time ( there are no missing data lines!) and in each batch find the maximum or minimum.

I have 63126 lines=24 hr*263 days

So to iterate through the lines;

import numpy as np

input_temps='/L7_HW_SASP_w1112.csv'
up_air_min=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(5))
up_air_max=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(6))
day_year=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(0))


dt_per_all_days=[]
for i in range (0,63126,1):

  # I get stuck here how to limit the iteration for 24 at a time.
  # if I can do that I think I can get the rest done.


  min_d=[]
  max_d=[]
  min_d.append( up_air_min[i])
  max_d.append( up_air_max[i])
  max_per_day=max(max_d)
  min_per_day=min(min_d)
  dt_d=max_per_day-min_per_day
  dt_per_all_days.append(dt_d)

  del(min_d)
  del(max_d)
  move to the next batch of 24 lines....

`

Use the Numpy, Luke, avoid for-loops.

Then you have ap_air_min and ap_air_max numpy arrays you can easily do what you want by using numpy element-wise functions.

At first, create 2d array with 263 rows (one for a day) and 24 columns like this:

min_matrix = up_air_min.reshape((263, 24))
max_matrix = up_air_max.reshape((263, 24))

Then use np.min and np.max functions along axis 1 ( good array tip sheet ):

 min_temperature = np.min(min_matrix, axis=1)
 max_temperature = mp.max(max_matrix, axis=1)

And find the difference:

dt = max_temperature - min_temperature

dt is array with needed values. Let's save it to foo.csv:

np.savetxt('foo.csv', np.swapaxes([day_year, dt], 0, 1), delimiter=',')

And final code looks like this:

import numpy as np

# This I got from your answer.
input_temps='/L7_HW_SASP_w1112.csv'
up_air_min=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(5))
up_air_max=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(6))
day_year=np.genfromtxt(input_temps,skip_header=1, dtype=float, delimiter=',',usecols=(0))

# Split arrays and create matrix with 263 lines-days and 24 values in every line.
min_matrix = up_air_min.reshape((263, 24))
max_matrix = up_air_max.reshape((263, 24))

# Find min temperature for every day. min_temperature is an array with 263 values.
min_temperature = np.min(min_matrix, axis=1)
# The same for max temperature.
max_temperature = mp.max(max_matrix, axis=1)

# Subtract  min temperature from max.
dt = max_temperature - min_temperature

# Save result in csv.
np.savetxt('foo.csv', np.swapaxes([day_year, dt], 0, 1), delimiter=',')

A reasonably pythonic way to do this would be to have a function that loops over the rows, gathering them up and spitting out the gathered rows using yield when the day changes. This gives you a generator that generates 263 lists each holding 24 values, which is a bit easier to process.

If you've definitely not got any missing values, you could use a trivial doubly-nested loop without batching up the elements first. That's a bit more fragile, but it sounds like you might not be planning to re-use the code anyway.

Here's a somewhat contrived example of how you could chunk things by 24 lines at a time.

from StringIO import StringIO
from random import random as r
import numpy as np
import operator

s = StringIO()
for x in xrange(0,10000):
        s.write('%f,%f,%f\n' % (r(),r()*10,r()*100))
s.seek(0)

data = np.genfromtxt(s,dtype=None, names=['pitch','yaw','thrust'], delimiter=',')

for x in range(0,len(data),24):
        print('Acting on hours %d through %d' % (x, x+24))
        one_day = data[x:x+24]
        minimum_yaw = min(one_day['yaw'])
        max_yaw = max(one_day['yaw'])
        print 'min',minimum_yaw,'max',max_yaw,'one_day',one_day['yaw']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM