简体   繁体   中英

Running filter over a large amount of data points and a long time period?

I need to apply two running filters on a large amount of data. I have read that creating variables on the fly is not a good idea, but I wonder if it still might be the best solution for me.

My question: Can I create arrays in a loop with the help of a counter (array1, array2…) and then call them with the counter (something like: 'array'+str(counter) or 'array'+str(counter-1)?

Why I want to do it: The data are 400x700 arrays for 15min time steps over a year (So I have 35000 400x700 arrays). Each time step is read into python individually. Now I need to apply one running filter that checks if the last four time steps are equal (element-wise) and if they are, then all four values are set to zero. The next filter uses the data after the first filter has run and checks if the sum of the last twelve time steps exceeds a certain value. When both filters are done I want to sum up the values, so that at the end of the year I have one 400x700 array with the filtered accumulated values.

I do not have enough memory to read in all the data at once. So I thought I could create a loop where for each time step a new variable for the 400x700 array is created and the two filters run. The older arrays that are filtered I could then add to the yearly sum and delete, so that I do not have more than 16 (4+12) time steps(arrays) in memory at all times.

I don't now if it's correct of me to ask such a question without any code to show, but I would really appreciate the help.

If your question is about the best data structure to keep a certain amount of arrays in memory, in this case I would suggest using a three dimensional array. It's shape would be (400, 700, 12) since twelve is how many arrays you need to look back at. The advantage of this is that your memory use will be constant since you load new arrays into the larger one. The disadvantage is that you need to shift all arrays manually.

If you don't want to deal with the shifting yourself I'd suggest using a deque with a maxlen of 12.

"Can I create arrays in a loop with the help of a counter (array1, array2…) and then call them with the counter (something like: 'array'+str(counter) or 'array'+str(counter-1)?"

This is a very common question that I think a lot of programmers will face eventually. Two examples for Python on Stack Overflow:

The lesson to learn from this is to not use dynamic variable names, but instead put the pieces of data you want to work with in an encompassing data structure.

The data structure could eg be a list , dict or Numpy array. Also the collections.deque proposed by @Midnighter seems to be a good candidate for such a running filter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM