Linux中的Python线程与多处理

Question

基于这个问题，我假设创建新进程的 速度几乎与在Linux中创建新线程 一样快 。 然而，很少测试显示出非常不同的结果。 这是我的代码：

from multiprocessing import Process, Pool
from threading import Thread

times = 1000

def inc(a):
    b = 1
    return a + b

def processes():
    for i in xrange(times):
        p = Process(target=inc, args=(i, ))
        p.start()
        p.join()

def threads():
    for i in xrange(times):
        t = Thread(target=inc, args=(i, ))
        t.start()
        t.join()

测试：

>>> timeit processes() 
1 loops, best of 3: 3.8 s per loop

>>> timeit threads() 
10 loops, best of 3: 98.6 ms per loop

因此，创建过程的速度几乎要慢40倍 ！ 为什么会这样？ 它是特定于Python还是这些库？ 还是我误解了上面的答案？

UPD 1.使其更加清晰。 据我所知，这段代码实际上并没有引入任何并发性。 这里的目标是测试创建进程和线程所需的时间。 要在Python中使用真正的并发，可以使用以下内容：

def pools():
    pool = Pool(10)
    pool.map(inc, xrange(times))

它的运行速度比线程版快得多。

UPD 2.我在os.fork()添加了版本：

for i in xrange(times):
    child_pid = os.fork()
    if child_pid:
        os.waitpid(child_pid, 0)
    else:
        exit(-1)

结果是：

$ time python test_fork.py 

real    0m3.919s
user    0m0.040s
sys     0m0.208s

$ time python test_multiprocessing.py 

real    0m1.088s
user    0m0.128s
sys     0m0.292s

$ time python test_threadings.py

real    0m0.134s
user    0m0.112s
sys     0m0.048s

Answer 1

你链接的问题是比较只调用fork(2)和pthread_create(3)的成本，而你的代码做得更多，例如使用join()来等待进程/线程终止。

如果，如你所说......

这里的目标是测试创建进程和线程所需的时间。

......那你就不应该等他们完成了。 你应该使用更像这样的测试程序......

fork.py

import os
import time

def main():
    for i in range(100):
        pid = os.fork()
        if pid:
            #print 'created new process %d' % pid
            continue
        else:
            time.sleep(1)
            return

if __name__ == '__main__':
    main()

thread.py

import thread
import time

def dummy():
    time.sleep(1)

def main():
    for i in range(100):
        tid = thread.start_new_thread(dummy, ())
        #print 'created new thread %d' % tid

if __name__ == '__main__':
    main()

...给出以下结果......

$ time python fork.py
real    0m0.035s
user    0m0.008s
sys     0m0.024s

$ time python thread.py
real    0m0.032s
user    0m0.012s
sys     0m0.024s

...所以线程和进程的创建时间没有太大差别。

Answer 2

是的，它是真实的。 开始一个新的流程（称为重量级流程）成本很高。

作为概述......

操作系统必须（在linux情况下）分叉第一个进程，设置新进程的计费，设置新堆栈，执行上下文切换，复制任何更改的内存，并在新的时候删除所有内存过程返回。

线程只是分配一个新的堆栈和线程结构，执行上下文切换，并在完成工作时返回。

......这就是我们使用线程的原因。

Answer 3

根据我的经验，创建线程（使用pthread_create ）和分叉进程之间存在显着差异。

例如，我创建了一个类似于python测试的C测试，其线程代码如下：

pthread_t thread; 
pthread_create(&thread, NULL, &test, NULL); 
void *res;
pthread_join(thread, &res);

并像这样处理分叉代码：

pid_t pid = fork();
if (!pid) {
  test(NULL);
  exit(0);
}         
int res;
waitpid(pid, &res, 0);

在我的系统上，分叉代码执行的时间大约是其8倍。

但是，值得注意的是python的实现速度更慢 - 对我来说它的速度大约慢了16倍。 我怀疑这是因为除了创建新进程的常规开销之外，还有更多与新进程相关的python开销。

Linux中的Python线程与多处理

问题描述

3 个解决方案

解决方案1
5 已采纳 2013-07-02 14:11:06

解决方案2
2 2013-07-02 13:04:00

解决方案3
1 2013-07-02 13:30:45

Linux中的Python线程与多处理

问题描述

3 个解决方案

解决方案1 5 已采纳 2013-07-02 14:11:06

解决方案2 2 2013-07-02 13:04:00

解决方案3 1 2013-07-02 13:30:45

解决方案1
5 已采纳 2013-07-02 14:11:06

解决方案2
2 2013-07-02 13:04:00

解决方案3
1 2013-07-02 13:30:45