简体   繁体   English

mpi4py包含进程和线程

[英]mpi4py with processes and threads

Hi This is a pretty specific question, so I hope StackOverflow is meant for all programming languages and not just javascript/html 嗨这是一个非常具体的问题,所以我希望StackOverflow适用于所有编程语言,而不仅仅是javascript / html

I am writing a multi program in MPICH2 (popular message passing interface). 我正在MPICH2(流行的消息传递接口)中编写一个多程序。 My program is written in Python so I use the MPI4Py Python bindings. 我的程序是用Python编写的,所以我使用MPI4Py Python绑定。 MPI is best for situations with no shared memory, therefore, it is not ideal for multicore programming. MPI最适合没有共享内存的情况,因此,它不适合多核编程。 To use the full 4 cores of my 5 node cluster I am further using threads. 要使用我的5节点集群的完整4个核心,我还在使用线程。 However, I have noticed that using threads actually slows my simulation down. 但是,我注意到使用线程实际上减慢了我的模拟速度。 My program is several tens of thousands of lines of code, so I can not put it all up, but here is the snippet which is causing problems 我的程序是几万行代码,所以我不能把它全部搞定,但这里是导致问题的片段

from threading import Thread
...
threadIndeces=[[0,10],[11,20],[21,30],[31,40]] #subset for each thread
for indeces in treadIndeces:
  t=Thread(target=foo,args=(indeces,))
  t.start()

Also, I make sure to join the threads later. 另外,我确保稍后加入线程。 If I run it with no threads, and just call foo with all the indeces, it is about 10-15x times faster. 如果我在没有线程的情况下运行它,并且只用所有的indeces调用foo ,它大约快10-15倍。 When I record the times of the multithreaded version, the creation of the threads in the call t=Thread(target=foo,args=(indeces,)) takes around 0.05 seconds, the join similarly takes 0.05 seconds but the t.start() calls takes a whopping 0.2 seconds. 当我记录多线程版本的时间时,在调用t=Thread(target=foo,args=(indeces,))创建线程大约需要0.05秒,连接类似地需要0.05秒但是t.start()电话需要0.2秒。

Is start() an expensive call? start()是一个昂贵的电话吗? Should I be changing my approach? 我应该改变我的做法吗? I thought about keeping a pool of threads rather than creating new ones every iteration, but it does not seem like the t=Thread(target=foo,args=(indeces,)) is what's causing the slow down. 我考虑过保留一个线程池,而不是每次迭代都创建新的线程,但它似乎不是t=Thread(target=foo,args=(indeces,))导致速度减慢的原因。

Also, incase anyone wants to know the complexity of the foo , here is one of the functions which gets called i times for the indeces every iteration (non discrete time): 此外,柜面有人想知道的复杂foo ,这里是它被调用的功能之一i多次对indeces每次迭代(非离散时间):

def HD_training_firing_rate(HD_cell):
    """During training, the firing rate is governed by the difference between the 
       current heading direction and the preferred heading direction. This is made
       to resemble a Gaussian distribution
    """
    global fabs
    global exp
    global direction

    #loop over twice due to concurrent CW and CCW HD training
    for c in [0,1]:
        d=direction[c]
        dp=HD_cell.dp  #directional preferance
        s_d=20.0  #standard deviation
        s_i=min(fabs(dp-d),360-fabs(dp-d)) #circular deviation from preferred dir.

        HD_cell.r[c]=exp(-s_i*s_i/(2*s_d*s_d))  #normal distribution

If you need threads, python may not be your best option due to the Global Interpreter Lock which prevents true concurrency. 如果你需要线程,python可能不是你最好的选择,因为Global Interpreter Lock会阻止真正的并发。 See also Dave Beazly's disturbing talk . 另见Dave Beazly的令人不安的谈话

You might be better off just running 20 processes to keep your 4 cores and 5 nodes busy, and just use MPI for all communication. 您可能最好只运行20个进程以保持4个核心和5个节点繁忙,并且只需使用MPI进行所有通信。

Python incurs a lot of overhead on the big iron--you may want to think about C or C++ (or dare I say Fortran?) if you're really committed to a joint threads/message passing approach. 如果你真的致力于联合线程/消息传递方法,Python可能会想到C或C ++(或者我敢说Fortran?)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM