简体   繁体   English

如何让列表理解更快?

[英]How to make list Comprehensions faster?

Is there anything in general i could use to speed up nested list comprehensions ( sometimes conditional)?一般来说,我可以用什么来加速嵌套列表理解(有时是有条件的)?

I was thinking about numpy or transposing (but how?), but there might be anything ive overseen.我在考虑 numpy 或转置(但如何?),但我可能会监督任何事情。

Examples:例子:

nextInt = 1
winPoints  = [[max(upSlice[:i])  for i in range(nextInt,len(upSlice)+1)] for upSlice in ppValues]```

or或者

winningRatio =[ [1 if ratioUp >ratioDown  else 0 if (ratioDown>ratioUp) else
                1 if (pointsUp>pointsDown) else 0 if (pointsDown>pointsUp) else 1   
                for ratioUp,ratioDown,pointsUp,pointsDown                  in  zip(ratioUpSlice,ratioDownSlice,pointsUpSlice,pointsDownSlice)] 
                for ratioUpSlice,ratioDownSlice,pointsUpSlice,pointsDownSlice in  zip(ratios_Up,ratios_Down, pointsUpSlices,pointsDownSlices)]

(all slices in the nested list do NOT have the same lenght) (嵌套列表中的所有切片都没有相同的长度)

I'm assuming ppValues is a 2-dimensional list of numbers?我假设ppValues是一个二维数字列表?

Generally, to speed up algorithms you want to shave off any redundant calculation.通常,为了加快算法速度,您需要减少任何冗余计算。 For example, in your first piece of code, the max(upSlice[:i]) part is executed (n*m) times and is executing approximately ((n^2)/2 * m) calculations where (n) is the length of the inner list and m is the length of the outer list.例如,在您的第一段代码中, max(upSlice[:i])部分被执行 (n*m) 次并且正在执行大约 ((n^2)/2 * m) 计算,其中 (n) 是内部列表的长度,m 是外部列表的长度。 And the execution is calculating the max of the same slice (except wIth 1 additional element) which is a lot of unnecessary comparisons.并且执行是计算同一切片的最大值(除了 1 个附加元素),这是很多不必要的比较。

To speed it up, we can use dynamic programming, where we build a slice_max array such that the element at index i is the max(upSlice[:i]) .为了加快速度,我们可以使用动态编程,我们构建一个slice_max数组,使得索引i处的元素是max(upSlice[:i]) And we can do this efficiently by utilizing past computations since the element at index i is equal to the max of two elements, max(slice_max[i-1], upSlice[i]) which is only a single computation.我们可以通过利用过去的计算有效地做到这一点,因为索引i处的元素等于两个元素的max(slice_max[i-1], upSlice[i]) ,这只是一次计算。

Here's a simple implementation of the dynamic programming version:这是动态编程版本的简单实现:

def get_slice_max(arr, start):
    result = [max(arr[:start])]
    for i in range(start, len(arr)):
        result.append(max(result[-1], arr[i]))
    return result
winPoints  = [get_slice_max(upSlice, nextInt) for upSlice in ppValues]

Here's a comparison:这是一个比较:

Generating Data生成数据

# generate fake data
np.random.seed(42)
# the variable length of inner lists: [7270,  860, 5390, ..., 1389, 4276, 1249]
inner_sizes = np.random.randint(low=1, high=1000, size=10000)
ppValues = [np.random.randint(1000, size=i) for i in inner_sizes]
# ppValues contains 10000 lists, each having 1 to 1000 elements, each element is a number between 1 to 10000

Your version你的版本

nextInt=1
winPoints  = [[max(upSlice[:i])  for i in range(nextInt,len(upSlice)+1)] for upSlice in ppValues]

------------------
Wall time: 2min 36s

Simple Dynamic Programming Version简单动态规划版

def get_slice_max(arr, start):
    result = [max(arr[:start])]
    for i in range(start, len(arr)):
        result.append(max(result[-1], arr[i]))
    return result
winPoints  = [get_slice_max(upSlice, nextInt) for upSlice in ppValues]

-----------------
Wall time: 1.79 s

You can see a major improvement from 2.5 minutes to <2 seconds.您可以看到从 2.5 分钟到 <2 秒的重大改进。

As for the second example, you did not provide enough context to try to tackle that problem.至于第二个示例,您没有提供足够的上下文来尝试解决该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM