简体   繁体   English

python multiprocessing为什么要慢得多

[英]python multiprocessing why much slower

For a map task from a list src_list to dest_list, len(src_list) is of the level of thousands: 对于从列表src_list到dest_list的映射任务,len(src_list)的级别为数千:

def my_func(elem):
    # some complex work, for example a minimizing task
    return new_elem

dest_list[i] = my_func(src_list[i])

I use multiprocessing.Pool 我使用multiprocessing.Pool

pool = Pool(4)
# took 543 seconds
dest_list = list(pool.map(my_func, src_list, chunksize=len(src_list)/8))

# took 514 seconds
dest_list = list(pool.map(my_func, src_list, chunksize=4))

# took 167 seconds
dest_list = [my_func(elem) for elem in src_list]

I am confused. 我很困惑。 Can someone explain why the multiprocessing version runs even slower? 有人可以解释为什么多处理版本运行得更慢吗?

And I wonder what are the considerations to the choice of chunksize and the choice between multi-threads and multi-processes, especially for my problem. 而且我想知道选择块大小以及在多线程和多进程之间进行选择有哪些考虑因素,特别是对于我的问题。 Also, currently, I measure time by sum all time spent in the my_func method because directly using 另外,目前,我直接通过使用my_func方法花费的所有时间来总计时间,因为

t = time.time()
dest_list = pool.map...
print time.time() - t

doesn't work. 不起作用。 However, in here , the document says map() blocks until the result is ready , it seems different to my result. 但是,在这里 ,文档说map() 阻塞直到结果准备好为止 ,这似乎与我的结果不同。 Is there another way rather than simply sum the time? 还有其他方法,而不是简单地将时间相加吗? I have tried pool.close() with pool.join() which does not work. 我已经尝试使用不可用的pool.join()pool.close()

src_list is of length around 2000. time.time()-t doesn't work because it does not sum up all the time spent in my_func in pool.map. src_list的长度大约为2000。time.time()-t无效,因为它不能汇总pool.map中my_func中花费的所有时间。 And strange thing happended when I used timeit. 当我使用timeit时,发生了奇怪的事情。

def wrap_func(src_list):
    pool = Pool(4)
    dest_list = list(pool.map(my_func, src_list, chunksize=4))

print timeit("wrap_func(src_list)", setup="import ...")

It ran into 碰到了

OS Error Cannot allocate memory

guess I have used timeit in a wrong way... 猜猜我以错误的方式使用了timeit ...

I use python 2.7.6 under Ubuntu 14.04. 我在Ubuntu 14.04下使用python 2.7.6。

Thanks! 谢谢!

Multiprocessing requires overhead to pass the data between processes because processes do not share memory. 多进程需要开销才能在进程之间传递数据,因为进程不共享内存。 Any object passed between processes must be pickled ( represented as a string) and depickled. 在进程之间传递的任何对象都必须被腌制( 表示为字符串)并被去除。 This includes objects passed to the function in you list src_list and any object returned to dest_list . 这包括传递给列表src_list的函数的对象以及返回给dest_list任何对象。 This takes time. 这需要时间。 To illustrate this you might try timing the following function in a single process and in parallel. 为了说明这一点,您可以尝试在单个过程中并行地计时以下功能。

def NothingButAPickle(elem):
    return elem

If you loop over your src_list in a single process this should be extremely fast because Python only has to make one copy of each object in the list in memory. 如果您在单个进程中循环src_list ,则这应该非常快,因为Python只需为内存中的列表中的每个对象制作一个副本。 If instead you call this function in parallel with the multiprocessing package it has to (1) pickle each object to send it from the main process to a subprocess as a string (2) depickle each object in the subprocess to go from a string representation to an object in memory (3) pickle the object to return it to the main process represented as a string, and then (4) depickle the object to represent it in memory in the main process. 如果改为与多处理程序包并行调用此函数,则它必须(1)腌制每个对象以将其作为字符串从主进程发送到子进程(2)派遣子进程中的每个对象从字符串表示形式到内存中的对象(3)腌制该对象以将其返回给以字符串表示的主进程,然后(4)派遣该对象以使其在主进程中的内存中表示该对象。 Without seeing your data or the actual function, this overhead cost typically only exceeds the multiprocessing gains if the objects you are passing are extremely large and/or the function is actually not that computationally intensive. 如果看不到您的数据或实际功能,则此开销成本通常仅在您传递的对象非常大和/或该功能实际上不占用大量计算资源的情况下才超过多处理收益。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM