简体   繁体   中英

Creating multiple arrays within a for loop (Python)

I'm currently having an issue with Numpy arrays. If this question has already been asked elsewhere, I apologize, but I feel that I have looked everywhere.

My initial issue was that I was attempting to create an array and fill it with multiple sets of station data of different sizes. Since I cannot fill the same array with data sets that vary in size, I decided I need to create a new array for each station data set by defining the array inside the for loop I'm using to iterate through each station data set. The problem with this is that, while looping through, each data set will overwrite the previous data set, returning only the final instance of the for loop.

Then, I tried using the + and then the join operations to concatenate a new title for each array, but turns out that is illegal when defining arrays. This is the instance of the program where each data array overwrites the previous one. Note that not all the code is included and that this is part of a definition.

for k in range(len(stat_id)):

    ## NOTE - more code precedes this final portion of the for loop, but was
    ## not included as it is unrelated to the issue at hand.

    # Bring all the data into one big array.
    metar_dat = np.zeros((len(stat_id),len(temp),7), dtype='object')
    for i in range(len(temp)):
        metar_dat[k,i] = np.dstack((stat_id[k], yr[i], month[i], day[i], time[i], temp[i], dwp[i]))
    #print np.shape(metar_dat[k])
    #print metar_dat[k]

#print np.shape(metar_dat) # Confirm success with shape read.
return metar_dat

Upon running and printing the array from this definition, I get this (two empty arrays and a final filled array):

[[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
..., 
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]

[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
..., 
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]

[[\TZR 2015 7 ..., 2342 58 48]
[\TZR 2015 7 ..., 2300 59 47]
[\TZR 2015 7 ..., 2200 60 48]
..., 
[\TZR 2015 7 ..., 0042 56 56]
[\TZR 2015 7 ..., 0022 56 56]
[\TZR 2015 7 ..., 0000 56 56]]]

My question is this:

How can I create an array for each set of station data such that I do not overwrite any previous data?

Or

How can I create a single array that contains data sets with varying numbers of rows?

I am still new to Python (and new to posting here) and any ideas would be much appreciated.

You're setting your 2D array to zero inside your k-loop each time. Set it to zero (or empty, if all elements get filled, as in your case) once outside your nested loop, and you should be fine:

metar_dat = np.empty((len(stat_id),len(temp),7), dtype='object')
for k in range(len(stat_id)):
    for i in range(len(temp)):
        metar_dat[k,i] = np.dstack((stat_id[k], yr[i], month[i], day[i], time[i], temp[i], dwp[i]))
return metar_dat

You get a metar_dat array that is mostly 0 because it is the one you created at the last k iteration. It was len(stat_id) long (in the 1st dimensions) but you only inserted data for the last k . You threw away the results for the earlier k .

I would suggest collecting the data in a dictionary, rather than object array.

metar_dat = dict()  # dictionary rather than object array
for id in stat_id:
    # Bring all the data into one big array.
    data = np.column_stack([yr, month, day, time,temp, dwp])
    # should produce as (len(temp),6) integer array
    # or float is one or mo    for k in range(len(stat_id)):
    metar_dat[id] = data

If len(temp) varies for each id , you can't make a meaningful 3d array with shape (len(stat_id), len(temp), 7) - unless you pad every one to the same maximum length. When thinking about arrays, thing rectangles, not ragged lists.

A Python dictionary is a much better way of collecting information by some sort of unique id.

Object arrays let you generalize the concept of numeric arrays, but they don't give much added power compared to lists or dictionaries. You can't for example, add values across the 'id' dimension.

You need to describe what you hope to do with this data once you collect it. That will help guide our recommendations regarding the data representation.

There are other ways of defining the data structure for each id . It looked like yr , time , temp were equal length arrays. If they are all numbers they could be collected into an array with 6 columns. If it is important to keep some integer, while others are floats (or even strings) you could use a structured array.

Structured arrays are often produced by reading column data from a csv file. Some columns will have string data (ids) others integers or even dates, others float data. np.genfromtxt is a good tool for loading that sort of file.

You might also take a look into this post,

How can I make multiple empty arrays in python?

Lookup list comprehensions

listOfLists = [[] for i in range(N)] Now, listOfLists has N empty lists in it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM