mpi4py包含进程和线程

Question

Hi This is a pretty specific question, so I hope StackOverflow is meant for all programming languages and not just javascript/html 嗨这是一个非常具体的问题，所以我希望StackOverflow适用于所有编程语言，而不仅仅是javascript / html

I am writing a multi program in MPICH2 (popular message passing interface). 我正在MPICH2（流行的消息传递接口）中编写一个多程序。 My program is written in Python so I use the MPI4Py Python bindings. 我的程序是用Python编写的，所以我使用MPI4Py Python绑定。 MPI is best for situations with no shared memory, therefore, it is not ideal for multicore programming. MPI最适合没有共享内存的情况，因此，它不适合多核编程。 To use the full 4 cores of my 5 node cluster I am further using threads. 要使用我的5节点集群的完整4个核心，我还在使用线程。 However, I have noticed that using threads actually slows my simulation down. 但是，我注意到使用线程实际上减慢了我的模拟速度。 My program is several tens of thousands of lines of code, so I can not put it all up, but here is the snippet which is causing problems 我的程序是几万行代码，所以我不能把它全部搞定，但这里是导致问题的片段

from threading import Thread
...
threadIndeces=[[0,10],[11,20],[21,30],[31,40]] #subset for each thread
for indeces in treadIndeces:
  t=Thread(target=foo,args=(indeces,))
  t.start()

Also, I make sure to join the threads later. 另外，我确保稍后加入线程。 If I run it with no threads, and just call foo with all the indeces, it is about 10-15x times faster. 如果我在没有线程的情况下运行它，并且只用所有的indeces调用foo ，它大约快10-15倍。 When I record the times of the multithreaded version, the creation of the threads in the call t=Thread(target=foo,args=(indeces,)) takes around 0.05 seconds, the join similarly takes 0.05 seconds but the t.start() calls takes a whopping 0.2 seconds. 当我记录多线程版本的时间时，在调用t=Thread(target=foo,args=(indeces,))创建线程大约需要0.05秒，连接类似地需要0.05秒但是t.start()电话需要0.2秒。

Is start() an expensive call? start()是一个昂贵的电话吗？ Should I be changing my approach? 我应该改变我的做法吗？ I thought about keeping a pool of threads rather than creating new ones every iteration, but it does not seem like the t=Thread(target=foo,args=(indeces,)) is what's causing the slow down. 我考虑过保留一个线程池，而不是每次迭代都创建新的线程，但它似乎不是t=Thread(target=foo,args=(indeces,))导致速度减慢的原因。

Also, incase anyone wants to know the complexity of the foo , here is one of the functions which gets called i times for the indeces every iteration (non discrete time): 此外，柜面有人想知道的复杂foo ，这里是它被调用的功能之一i多次对indeces每次迭代（非离散时间）：

def HD_training_firing_rate(HD_cell):
    """During training, the firing rate is governed by the difference between the 
       current heading direction and the preferred heading direction. This is made
       to resemble a Gaussian distribution
    """
    global fabs
    global exp
    global direction

    #loop over twice due to concurrent CW and CCW HD training
    for c in [0,1]:
        d=direction[c]
        dp=HD_cell.dp  #directional preferance
        s_d=20.0  #standard deviation
        s_i=min(fabs(dp-d),360-fabs(dp-d)) #circular deviation from preferred dir.

        HD_cell.r[c]=exp(-s_i*s_i/(2*s_d*s_d))  #normal distribution

Answer 1

If you need threads, python may not be your best option due to the Global Interpreter Lock which prevents true concurrency. 如果你需要线程，python可能不是你最好的选择，因为Global Interpreter Lock会阻止真正的并发。 See also Dave Beazly's disturbing talk . 另见Dave Beazly的令人不安的谈话。

You might be better off just running 20 processes to keep your 4 cores and 5 nodes busy, and just use MPI for all communication. 您可能最好只运行20个进程以保持4个核心和5个节点繁忙，并且只需使用MPI进行所有通信。

Python incurs a lot of overhead on the big iron--you may want to think about C or C++ (or dare I say Fortran?) if you're really committed to a joint threads/message passing approach. 如果你真的致力于联合线程/消息传递方法，Python可能会想到C或C ++（或者我敢说Fortran？）。

mpi4py包含进程和线程

问题描述

1 个解决方案

解决方案1
4 已采纳 2011-04-07 22:53:16

mpi4py包含进程和线程

问题描述

1 个解决方案

解决方案1 4 已采纳 2011-04-07 22:53:16

解决方案1
4 已采纳 2011-04-07 22:53:16