简体   繁体   English

Python多处理比单线程慢

[英]Python multiprocessing slower than single thread

I have been playing around with multiprocessing problem and notice my algorithm is slower when I parallelizes it than when it is single thread. 我一直在处理多处理问题,并注意到我将其并行化时的算法比单线程处理的算法慢。

In my code I don't share memory. 在我的代码中,我不共享内存。 And I'm pretty sure my algorithm (see code), which is just nested loops is CPU bound. 而且我非常确定我的算法(请参见代码)(只是嵌套循环)受CPU限制。

However, no matter what I do. 但是,无论我做什么。 The parallel code runs 10-20% slower on all my computers. 并行代码在所有计算机上的运行速度都降低了10-20%。

I also ran this on a 20 CPUs virtual machine and single thread beats multithread every times (even slower up there than my computer, actually). 我还在20个CPU的虚拟机上运行了该程序,每次都单线程击败多线程(实际上甚至比我的计算机还慢)。

from multiprocessing.dummy import Pool as ThreadPool
from multi import chunks
from random import random
import logging
import time
from multi import chunks

## Product two set of stuff we can iterate over
S = []
for x in range(100000):
  S.append({'value': x*random()})
H =[]
for x in range(255):
  H.append({'value': x*random()})

# the function for each thread
# just nested iteration
def doStuff(HH):
  R =[]
  for k in HH['S']:
    for h in HH['H']:
      R.append(k['value'] * h['value'])
  return R

# we will split the work
# between the worker thread and give it
# 5 item each to iterate over the big list
HChunks = chunks(H, 5)
XChunks = []

# turn them into dictionary, so i can pass in both
# S and H list
# Note: I do this because I'm not sure if I use the global
# S, will it spend too much time on cache synchronizatio or not
# the idea is that I dont want each thread to share anything.
for x in HChunks:
  XChunks.append({'H': x, 'S': S})

print("Process")
t0 = time.time()
pool = ThreadPool(4)
R = pool.map(doStuff, XChunks)
pool.close()
pool.join()

t1 = time.time()

# measured time for 4 threads is slower 
# than when i have this code just do 
# doStuff(..) in non-parallel way
# Why!?

total = t1-t0
print("Took", total, "secs")

There are many related question opened, but many are geared toward code being structured incorrectly - each worker being IO bound and such. 公开了许多相关问题,但是许多问题都针对代码结构不正确-每个工作人员都受到IO约束等。

You are using multithreading , not multiprocessing . 您正在使用多线程 ,而不是多处理 While many languages allow threads to run in parallel, python does not. 尽管许多语言允许线程并行运行,但python不允许。 A thread is just a separate state of control, ie it holds it own stack, current function, etc. The python interpreter just switches between executing each stack every now and then. 线程只是控制的单独状态,即它拥有自己的堆栈,当前函数等。python解释器仅在不时执行每个堆栈之间进行切换。

Basically, all threads are running on a single core. 基本上,所有线程都在单个内核上运行。 They will only speed up your program when you are not CPU bound. 仅当您不受 CPU限制时,它们才会加速您的程序。

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module. multiprocessing.dummy复制了多处理的API,但仅不过是线程模块的包装器。

Multithreading is usually slower than single threading if you are CPU bound. 如果受CPU限制,多线程通常比单线程 This is because the work and processing resources stay the same, but you add overhead for managing the threads, eg switching between them. 这是因为工作和处理资源保持不变,但是您增加了管理线程的开销,例如,在线程之间进行切换。

How to fix this : instead of using from multiprocessing.dummy import Pool as ThreadPool do multiprocessing.Pool as ThreadPool . 如何解决此问题 :不要将multiprocessing.Pool as ThreadPoolfrom multiprocessing.dummy import Pool as ThreadPool来使用。


You might want to read up on the GIL, the Global Interpreter Lock. 您可能想阅读GIL(全局解释器锁)。 It's what prevents threads from running in parallel (that and implications on single threaded performance). 这是阻止线程并行运行的原因(这和对单线程性能的影响)。 Python interpreters other than CPython may not have the GIL and be able to run multithreaded on several cores. 除CPython之外的其他Python解释器可能没有GIL,并且能够在多个内核上运行多线程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM