简体   繁体   English

在各种情况下循环迭代的最快方法

[英]Fastest way to iterate through loops in various scenarios

After extensive profiling of our source code base, we found out that all performance issues are caused while we are looping over some huge lists. 对我们的源代码库进行了广泛的分析之后,我们发现所有性能问题都是在遍历一些庞大的列表时引起的。

The code passages that are causing the issues could be identified as follows: 导致问题的代码段可以标识如下:

#ISSUE 1
myList = [i for j, i in enumerate(myList) if j not in anotherList]

#ISSUE 2
TargetIndex = next((myList.index(n) for n in myList if n > someBoundary), len(myList))

#ISSUE 3
def myFunction():
    for i in myList:
        if abs(i) > someLimit:
            return 0
    return 1

#ISSUE 4
for n,i in enumerate(myList):
    if abs(i) < someLimit:
        myList[n] = 0

I am quite sure that some numpy experts could write down four one-liners that would lead to a great performance boost of our application. 我非常确定,一些numpy专家可以写下四个单行代码,这将大大提高我们的应用程序的性能。 But perhaps there may even be a better way for those looping operations than numpy which I am not aware of. 但是对于那些循环操作,也许还有比我不知道的numpy更好的方法。

Any suggestions on the topic are highly appreciated. 对此主题的任何建议都将受到高度赞赏。

First issue: make a lookup in a set instead of a list 第一个问题:在set而不是list进行查找

anotherSet = set(anotherList)
myList = [i for j, i in enumerate(myList) if j not in anotherSet]

Second issue: why computing index of n when you already iterating on the list? 第二个问题:为什么已经在列表上进行迭代,为什么要计算n index Use enumerate 使用enumerate

TargetIndex = next((i for i,n in enumerate(myList) if n > someBoundary), len(myList))

For issues 3 & 4, there's not much you can do but precompute the list of absolute values so you don't perform it twice on the same list. 对于问题3和问题4,您没有什么可以做的,只能对绝对值列表进行预先计算,因此您不必在同一列表上执行两次。

abs_vals = [abs(n) for n in myList]

so for instance the 4th snippet becomes: 因此,例如,第四段变为:

for index,av in enumerate(abs_vals):
    if av < someLimit:
        myList[index] = 0

As a warning, you'll have to change a lot more than these if you want to keep your data as numpy arrays, but this is how you fix the issues you have. 作为警告,如果要将数据保留为numpy数组,则必须进行很多更改,但这是解决问题的方法。

import numpy as np

myArr=np.array(myList)

#1
myArr = myArr[np.in1d(np.arange(myArr.size), anotherList, invert = True)]

#2
TargetIndex = next(np.nonzero(myArr > someBoundary)[0].flat, myArr.size)

#3
def myFunction():
    return (np.abs(myArr) <= someLimit).astype(int)

#4
np.where(np.abs(myArr) < someLimit, 0, myArr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM