如何使带有两个for循环的python代码运行得更快（是否存在执行Mathematica的Parallelize的python方法）？

Question

I am completely new to python or any such programming language. 我对python或任何此类编程语言是全新的。 I have some experience with Mathematica. 我对Mathematica有一些经验。 I have a mathematical problem which though Mathematica solves with her own 'Parallelize' methods but leaves the system quite exhausted after using all the cores! 我有一个数学问题，尽管Mathematica用她自己的“并行化”方法解决了问题，但是在使用了所有内核之后，系统就变得筋疲力尽了！ I can barely use the machine during the run. 在跑步过程中，我几乎无法使用机器。 Hence, I was looking for some coding alternative and found python kind of easy to learn and implement. 因此，我一直在寻找一些编码替代方案，并发现python易于学习和实现。 So without further ado, let me tell you the mathematical problem and issues with my python code. 因此，事不宜迟，让我告诉您数学问题以及我的python代码问题。 As the full code is too long, let me give an outline. 由于完整的代码太长，让我概述一下。

1. Numericall solve a differential equation of the form y''(t) + f(t)y(t)=0, to get y(t) for some range, say C <= t <= D 1.数值求解形式为y''（t）+ f（t）y（t）= 0的微分方程，以在一定范围内得出y（t），例如C <= t <= D

2.Next, Interpolate the numerical result for some desired range to get the function: w(t), say for A <= t <= B 2.接下来，对某个所需范围的数值结果进行插值以获得函数：w（t），例如，对于A <= t <= B

3. Using w(t), to solve another differential equation of the form z''(t) + [ a + b W(t)] z(t) =0 for some range of a and b, for which I am using the loop. 3.使用w（t），求解a和b的某个范围内的另一个形式为z''（t）+ [a + b W（t）] z（t）= 0的微分方程，使用循环。

4. Deine F = 1 + sol1[157], to make a list like {a, b, F} . 4. Deine F = 1 + sol1 [157]，以生成类似{a，b，F}的列表 。 So let me give a prototype loop as this take most of the computation time. 因此，让我给出一个原型循环，因为这需要花费大量的计算时间。

for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        print('Solving for q = {}, a = {}'.format(q,a))
        sol1 = odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        print(t[157])
        F = 1 + sol1[157]                    
        f1.write("{}  {} {} \n".format(q, a, F))            
    f1.close()

Now, the real loop takes about 4 hrs and 30 minutes to complete (With some built-in functional form of w(t), it takes about 2 minute). 现在，完成真正的循环大约需要4小时30分钟（使用w（t）的某些内置函数形式，大约需要2分钟）。 When, I applied (without properly understanding what it does and how!) numba/autojit before the definition of fun in my code, the run time significantly improved and takes about 2 hrs and 30 minute. 何时，我在代码中定义fun之前应用了numba / autojit （没有正确地理解它的作用和方式！），运行时间显着改善，大约需要2个小时30分钟。 Also, writing two loops as itertools/product further reduces the run time by about 2 minutes only! 此外，将两个循环作为itertools / product编写还可将运行时间仅减少约2分钟！ However, Mathematica, when I let her use all the 4 cores, finishes the task within 30 minutes. 但是，当我让她使用全部4个核心时，Mathematica会在30分钟内完成任务。

So, is there a way to improve the runtime in python? 那么，有没有办法改善python中的运行时？

Answer 1

To speed up python, you have three options: 为了加快python的运行速度，您可以使用以下三种选择：

deal with specific bottlenecks in the program (as suggested in @LutzL's comment) 处理程序中的特定瓶颈（如@LutzL的注释中所建议）
try to speed up the code by compiling it into C using cython (or including C code using weave or similar techniques). 尝试通过使用cython将代码编译为C来加快代码速度（或使用weave或类似技术将C代码包括在内）。 Since the time-consuming computations in your case are not in python code proper but in scipy modules (at least I believe they are), this would not help you very much here. 由于您的情况下耗时的计算不是在python代码中正确的，而是在scipy模块中（至少我相信它们是正确的），因此这对您没有太大帮助。
implement multiprocessing as you suggested in your original question. 如您在原始问题中建议的那样实施多处理。 This will speed up your code to up to X (slightly less than) times faster if you have X cores. 如果您有X核心，这将使您的代码速度提高多达X（略小于）倍。 Unfortunately this is rather complicated in python. 不幸的是，这在python中相当复杂。

Implementing multiprocessing - example using the prototype loop from the original question 实现多重处理-使用原始问题中的原型循环的示例

I assume that the computations you do inside the nested loops in your prototype code are actually independent from one another. 我假设您在原型代码的嵌套循环内执行的计算实际上是彼此独立的。 Since your prototype code is incomplete, I am not sure this is the case, however. 由于您的原型代码不完整，因此我不确定情况是否如此。 Otherwise it will, of course, not work. 否则，它当然不起作用。 I will give an example using not your differential equation problem for the fun function but a prototype of the same signature (input and output variables). 我将给出一个示例，该示例不是将您的微分方程问题用于fun函数，而是将一个具有相同签名（输入和输出变量）的原型使用。

import numpy as np
import scipy.integrate
import multiprocessing as mp

def fun(y, t, b, c):
    # replace this function with whatever function you want to work with
    #    (this one is the example function from the scipy docs for odeint)
    theta, omega = y
    dydt = [omega, -b*omega - c*np.sin(theta)]
    return dydt

#definitions of work thread and write thread functions

def run_thread(input_queue, output_queue):
    # run threads will pull tasks from the input_queue, push results into output_queue
    while True:
        try:
            queueitem = input_queue.get(block = False)
            if len(queueitem) == 3:
                a, q, t = queueitem
                sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
                F = 1 + sol1[157]
                output_queue.put((q, a, F))
        except Exception as e:
            print(str(e))
            print("Queue exhausted, terminating")
            break

def write_thread(queue):    
    # write thread will pull results from output_queue, write them to outputfile.txt
    f1 = open("outputfile.txt", "w")
    while True:
        try:
            queueitem = queue.get(block = False)
            if queueitem[0] == "TERMINATE":
                f1.close()
                break
            else:
                q, a, F = queueitem                
                print("{}  {} {} \n".format(q, a, F))            
                f1.write("{}  {} {} \n".format(q, a, F))            
        except:
            # necessary since it will throw an error whenever output_queue is empty
            pass

# define time point sequence            
t = np.linspace(0, 10, 201)

# prepare input and output Queues
mpM = mp.Manager()
input_queue = mpM.Queue()
output_queue = mpM.Queue()

# prepare tasks, collect them in input_queue
for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        # Your computations as commented here will now happen in run_threads as defined above and created below
        # print('Solving for q = {}, a = {}'.format(q,a))
        # sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        # print(t[157])
        # F = 1 + sol1[157]    
        input_tupel = (a, q, t)
        input_queue.put(input_tupel)

# create threads
thread_number = mp.cpu_count()
procs_list = [mp.Process(target = run_thread , args = (input_queue, output_queue)) for i in range(thread_number)]         
write_proc = mp.Process(target = write_thread, args = (output_queue,))

# start threads
for proc in procs_list:
    proc.start()
write_proc.start()

# wait for run_threads to finish
for proc in procs_list:
    proc.join()

# terminate write_thread
output_queue.put(("TERMINATE",))
write_proc.join()

Explanation 说明

We define the individual problems (or rather their parameters) before commencing computation; 我们在开始计算之前定义各个问题（或更确切地说是它们的参数）； we collect them in an input Queue. 我们将它们收集在输入队列中。
We define a function ( run_thread ) that is run in the threads. 我们定义了一个在线程中运行的函数（ run_thread ）。 This function computes individual problems until there are none left in the input Queue; 此函数将计算单个问题，直到输入队列中没有剩余为止。 it pushes the results into an output Queue. 它将结果推送到输出队列。
We start as many such threads as we have CPUs. 我们启动的线程与CPU一样多。
We start an additional thread ( write_thread ) for collecting the results from the output queue and writing them into a file. 我们启动一个附加线程（ write_thread ），用于从输出队列中收集结果并将其写入文件。

Caveats 注意事项

For smaller problems, you can run multiprocessing without Queues. 对于较小的问题，您可以在没有队列的情况下运行多处理。 However, if the number of individual computations is large, you will exceed the maximum number of threads the kernel will allow you after which the kernel kills your program. 但是，如果单个计算的数量很大，您将超过内核允许的最大线程数，之后内核将杀死您的程序。
There are differences between different operating systems for how multiprocessing works. 对于多处理的工作方式，不同的操作系统之间存在差异。 The example above will work on Linux (perhaps also on other Unix like systems such as Mac and BSD), not on Windows . 上面的示例将在Linux上运行（也许也可以在其他Unix之类的系统，例如Mac和BSD）上运行，而不是在Windows上运行。 The reason is that Windows does not have a fork() system call. 原因是Windows没有fork（）系统调用。 (I do not have access to a Windows, can therefore not try to implement it for Windows.) （我无权访问Windows，因此无法尝试在Windows上实现它。）

如何使带有两个for循环的python代码运行得更快（是否存在执行Mathematica的Parallelize的python方法）？

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-05-12 22:50:00

如何使带有两个for循环的python代码运行得更快（是否存在执行Mathematica的Parallelize的python方法）？

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-05-12 22:50:00

解决方案1
2 已采纳 2017-05-12 22:50:00