如何加快 python 中的嵌套 for 循环

Question

I want to calculate the cM between two different windows along a chromosome.我想沿一条染色体计算两个不同的 windows 之间的 cM。 My code has three nested loops.我的代码有三个嵌套循环。 For sample, I use random number stand for the recombination map.作为示例，我使用随机数代表重组 map。

import random

windnr = 54800
w, h   = windnr, windnr
recmatrix = [[0 for x in range(w)] for y in range(h)]

#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)

for i in range(windnr):
    for j in range(windnr):
        recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window
        if i > j:
            recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean rdistance final window
            for k in range(i-1,j,-1):
                recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
        if i < j:
            recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean distance final window
            for k in range(i+1,j):
                recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
        #j += 1
    if i % 10 == 0:
        print("window {}".format(i))
    #i += 1

The calculation costs a lot of time.计算需要大量时间。 I have to calculate almost 7 days for my data.我必须为我的数据计算近 7 天。
Can I speed up the nested for loop within 10 hours?我可以在 10 小时内加速嵌套的 for 循环吗？ How can I increase the performance?我怎样才能提高性能？

Although the 2D array has 3 billion items (~96 GB when being floats), I would rule out hard disk swapping issues, since the server which does the computation has 200 GB of RAM.虽然 2D 数组有 30 亿个项目（浮动时约为 96 GB），但我会排除硬盘交换问题，因为进行计算的服务器有 200 GB 的 RAM。

Answer 1

Using Numpy will make your application much faster.使用Numpy将使您的应用程序更快。 It's written in C/C++, so it does not suffer from slow loops in Python .它是用 C/C++ 编写的，因此它不会受到Python 中慢循环的影响。

I'm doing my tests on an old Intel Xeon X5550 with 2 sockets, 8 cores and 96 GB of triple channel RAM.我正在对带有 2 个 sockets、8 个内核和 96 GB 三通道 RAM 的旧 Intel Xeon X5550 进行测试。 I don't have much experience with Numpy, so bear with me, if below code is not optimal.我对 Numpy 没有太多经验，所以请耐心等待，如果下面的代码不是最佳的。

Array initialization数组初始化

Already the initialization is much faster:初始化已经快得多了：

recmatrix = [[0 for x in range(w)] for y in range(h)]

needs 24 GB of RAM (integers) and takes 3:28 minutes on my PC.需要 24 GB 的 RAM（整数）并且在我的 PC 上需要 3:28 分钟。 Whereas然而

recmatrix = np.zeros((windnr, windnr), dtype=np.int)

is finished after 50 ms. 50 ms 后完成。 But since you need floats anyway, start with floats from the beginning:但是由于无论如何您都需要浮点数，因此请从头开始使用浮点数：

recmatrix = np.zeros((windnr, windnr), dtype=np.float)

Random samples随机样本

The code编码

#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)

did not work for me, so I replaced it and increased k for more stable measurements对我不起作用，所以我更换了它并增加了 k 以获得更稳定的测量结果

rec_map = random.choices(range(0, 30), k=5480000)

which runs in 2.5 seconds.运行时间为 2.5 秒。 The numpy replacement numpy 替换

rec_map = np.random.choice(np.arange(0, 30), size=5480000)

is done in 0.1 seconds.在 0.1 秒内完成。

The loop循环

The loop will need most work, since you'll avoid Python loops in Numpy whenever possible.该循环将需要大部分工作，因为您将尽可能避免 Numpy 中的 Python 循环。

For example, if you have an array and want to multiply all elements by 2, you would not write a loop but simply multiply the whole array:例如，如果您有一个数组并且想要将所有元素乘以 2，则您不会编写循环，而只需将整个数组相乘：

import numpy as np

single = np.random.choice(np.arange(0, 10), size=100)
doubled = single * 2
print(single, "\r\n", doubled)

I don't fully understand what the code does, but let's apply that strategy on the first part of the loop.我不完全理解代码的作用，但让我们在循环的第一部分应用该策略。 The original is原文是

for i in range(windnr):
    for j in range(windnr):
        recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window

and it takes 18.5 seconds with a reduced windnr = 5480 .减少windnr = 5480需要 18.5 秒。 The numpy equivalent should be numpy 等效应为

column = 0.25 * rec_map_np
recmatrix = np.repeat(column, windnr)

and is done within 0.25 seconds.并在 0.25 秒内完成。 Also note: since we're assigning the variable here, we don't need the zero initialization at all.另请注意：由于我们在这里分配变量，我们根本不需要零初始化。

For the if i>j: and if i<j: parts, I see that the first line is identical对于if i>j:和if i<j:部分，我看到第一行是相同的

recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j]

That means, this calculation is applied to all elements except the ones on the diagonal.这意味着，此计算适用于除对角线上的元素之外的所有元素。 You can use a mask for that:您可以为此使用掩码：

mask = np.ones((windnr, windnr), dtype=bool)
np.fill_diagonal(mask, False)
rec_map_2d = np.repeat(0.5 * rec_map_np, windnr-1)
recmatrix[mask] += rec_map_2d

This took only 1:20 minutes for all 54800 elements, but reached my RAM limit at 93 GB.所有 54800 个元素只用了 1:20 分钟，但达到了我的 RAM 限制，即 93 GB。

Answer 2

Usually in python looping always take much time.通常在 python 循环总是需要很长时间。 So if possible then in your case then use map this will save a lot of time for you.因此，如果可能的话，那么在您的情况下，请使用 map 这将为您节省大量时间。 Where you are using a iter(list) so it will be good for this script.你在哪里使用 iter(list) 所以这对这个脚本有好处。

example:例子：

def func():
    your code
nu = (1, 2, 3, 4) 
output =  map(func, nu)
print(output)

如何加快 python 中的嵌套 for 循环

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-07-16 18:35:31

Array initialization数组初始化

Random samples随机样本

The loop循环

解决方案2
-1 2020-07-16 14:27:48

如何加快 python 中的嵌套 for 循环

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-07-16 18:35:31

Array initialization数组初始化

Random samples随机样本

The loop循环

解决方案2 -1 2020-07-16 14:27:48

解决方案1
2 已采纳 2020-07-16 18:35:31

解决方案2
-1 2020-07-16 14:27:48