[英]Removing elements from the list when looping over it
This part of my code does not scale if dimension gets bigger. 如果尺寸变大,我的代码的这一部分将无法缩放。
I loop over my data and accumulate them every dt time window. 我遍历我的数据,并在每个dt时间窗口对其进行累积。 To do this I compare lower and upper time value.
为此,我比较上下时间值。 When I reach upper bound, I break the for loop for efficiency.
当我达到上限时,我打破了for循环以提高效率。 The next time I run for loop I want to start not from its beginning but from the element I stopped previously, for efficiency.
下次运行循环时,我不想从循环开始,而是从先前停止的元素开始,以提高效率。 How can I do that?
我怎样才能做到这一点?
I tried to remove/pop elements of the list but indexes get messed up. 我试图删除/弹出列表中的元素,但索引变得混乱。 I read that I cannot modify the list I loop over, but my goal seems to be not uncommon so there has to be solution.
我读到我无法修改循环的列表,但是我的目标似乎并不罕见,因此必须找到解决方案。 I don't care about original data list later in my code, I only want optimization of my accumulation.
以后我不在乎代码中的原始数据列表,我只想优化累积量。
# Here I generate data for you to show my problem
from random import randint
import numpy as np
dimension = 200
times = [randint(0, 1000) for p in range(0, dimension)]
times.sort()
values = [randint(0, dimension) for p in range(0, dimension)]
data = [(values[k], times[k]) for k in range(dimension)]
dt = 50.0
t = min(times)
pixels = []
timestamps = []
# this is my problem
while (t <= max(times)):
accumulator = np.zeros(dimension)
for idx, content in enumerate(data):
# comparing lower bound of the 'time' window
if content[1] >= t:
# comparing upper bound of the 'time' window
if (content[1] < t + dt):
accumulator[content[0]] += 1
# if I pop the first element from the list after accumulating, indexes are screwed when looping further
# data.pop(0)
else:
# all further entries are bigger because they are sorted
break
pixels.append(accumulator)
timestamps.append(t)
t += dt
In a simpler form, I think you are trying to do: 以一种简单的形式,我认为您正在尝试做:
In [158]: times=[0, 4, 6, 10]
In [159]: data=np.arange(12)
In [160]: cnt=[0 for _ in times]
In [161]: for i in range(len(times)-1):
...: for d in data:
...: if d>=times[i] and d<times[i+1]:
...: cnt[i]+=1
...:
In [162]: cnt
Out[162]: [4, 2, 4, 0]
And you are trying to make this data
loop more efficient by breaking form the loop when d
gets too large, and by starting the next loop after items which have already been counted. 而且,您试图通过在
d
太大时中断循环并在已经计数的项目之后开始下一个循环来使此data
循环更有效。
Adding the break is easy as you've done: 完成后,添加中断很容易:
In [163]: cnt=[0 for _ in times]
In [164]: for i in range(len(times)-1):
...: for d in data:
...: if d>=times[i]:
...: if d<times[i+1]:
...: cnt[i]+=1
...: else:
...: break
In [165]: cnt
Out[165]: [4, 2, 4, 0]
One way to skip the counted stuff is to replace the for d in data
with a index loop; 一种跳过计数的东西的方法是用索引循环替换
for d in data
的for d in data
。 and keep track of where we stopped last time around: 并跟踪我们上次停止的位置:
In [166]: cnt=[0 for _ in times]
In [167]: start=0
...: for i in range(len(times)-1):
...: for j in range(start,len(data)):
...: d = data[j]
...: if d>=times[i]:
...: if d<times[i+1]:
...: cnt[i]+=1
...: else:
...: start = j
...: break
...:
In [168]: cnt
Out[168]: [4, 2, 4, 0]
A pop
based version requires that I work with a list (my data
is an array), a requires inserting the value back at the break 一个基于
pop
的版本要求我使用一个列表(我的data
是一个数组),a需要在中断处插入该值
In [186]: datal=data.tolist()
In [187]: cnt=[0 for _ in times]
In [188]: for i in range(len(times)-1):
...: while True:
...: d = datal.pop(0)
...: if d>=times[i]:
...: if d<times[i+1]:
...: cnt[i]+=1
...: else:
...: datal.insert(0,d)
...: break
...:
In [189]: cnt
Out[189]: [4, 2, 4, 0]
In [190]: datal
Out[190]: [10, 11]
This isn't perfect, since I still have items on the list at the end (my times
don't cover the whole data
range). 这不是完美的,因为最后我仍然有项目在列表中(我的
times
没有涵盖整个data
范围)。 But it tests the idea. 但是它检验了这个想法。
Here's something closer to your attempt: 这更接近您的尝试:
In [203]: for i in range(len(times)-1):
...: for d in datal[:]:
...: if d>=times[i]:
...: if d<times[i+1]:
...: cnt[i]+=1
...: datal.pop(0)
...: else:
...: break
...:
The key difference is that I iterate on a copy of datal
. 关键区别在于我迭代了
datal
的副本。 That way the pop
affects datal
, but doesn't affect the current iteration. 这样,
pop
会影响datal
,但不会影响当前迭代。 Admittedly there's a cost to the copy, so the speed up might be significant. 不可否认,复制是有成本的,因此提高速度可能会很明显。
A different approach would be to loop on data
, and step time
as the t
and t+dt
boundaries are crossed. 另一种方法是在
data
上循环,并跨越t
和t+dt
边界时的步time
。
In [222]: times=[0, 4, 6, 10,100]
In [223]: cnt=[0 for _ in times]; i=0
In [224]: for d in data:
...: if d>=times[i]:
...: if d<times[i+1]:
...: cnt[i]+=1
...: else:
...: i += 1
...: cnt[i]+=1
...:
In [225]: cnt
Out[225]: [4, 2, 4, 2, 0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.