简体   繁体   English

python进程/线程映射在Windows中如何工作?为什么线程比进程更快?

[英]How is python process/thread map working in windows?Why thread works faster than process?

I am trying to find a faster way to run numpy/sklearn to do some task on Lists of Data. 我试图找到一种更快的方式来运行numpy / sklearn在数据列表上执行某些任务。 I got some books which suggest me to use Process rather than Thread in Heavy data computing jobs. 我得到一些建议我在重型数据计算工作中使用进程而不是线程的书。 While doing this I find that threads run faster than Process. 在执行此操作时,我发现线程的运行速度比Process快。 Why is that? 这是为什么? Which way should I choose? 我应该选择哪种方式?

# -*- coding: utf-8 -*-
"""
Created on Tue Apr  2 10:20:19 2019

@author: Simon
"""
import time
import numpy as np

from sklearn import linear_model
from concurrent.futures import ProcessPoolExecutor as Pool
from concurrent.futures import ThreadPoolExecutor as Pool

xx, yy = np.meshgrid(np.linspace(0,10,1000), np.linspace(10,100,1000))
zz = 1.0 * xx + 3.5 * yy + np.random.randint(0,100,(1000,1000))

X, Z = np.column_stack((xx.flatten(),yy.flatten())), zz.flatten()


regr = linear_model.LinearRegression()


def regwork(t):
    X=t[0]
    Z=t[1]
    regr.fit(X, Z)
    a, b = regr.coef_, regr.intercept_
    return a

def numpywork(t):
    X=t[0]
    Z=t[1]
    for i in range(1):
        r=np.sum(X,axis=1)+np.log(Z)
    return np.sum(r)

if __name__=="__main__":
    r=regx((X,Z))
    rlist=[[X,Z]]*500



    start=time.clock()
    pool = Pool(max_workers=2)
    results = pool.map(numpywork, rlist)

    for ret in results:
        print(ret)
    print(time.clock()-start)

Run on Win7-4 Real Core-I5-4700 with python 3.6. 使用python 3.6在Win7-4 Real Core-I5-4700上运行。 Here is the output: 这是输出:

Ways|Workerjob|Process Num showed in taskmgr|Cpu loads while working|Time cost 方式| Workerjob | taskmgr中显示的进程数|工作时的Cpu负载|时间成本

2threads|numpy |1 process|100%|9s 2threads | numpy | 1个进程| 100%| 9s

2threads|sklearn|1 process|100%|35s 2线程| sklearn | 1进程| 100%| 35s

2process|numpy |3 process|100%|36s 2进程| numpy | 3进程| 100%| 36s

2process|sklearn|3 process|100%|77s 2进程| sklearn | 3进程| 100%| 77s

Why process cost more time? 为什么处理会花费更多时间? How to find a better way to lower the time cost and make full use of the multi-core OS? 如何找到更好的方法来降低时间成本并充分利用多核OS?

OK. 好。 I have got it. 我知道了 For those modules that could release GIL like numpy, Using Thread backend will save time by reducing the Np object copy cost from main process to sub-process. 对于那些可以释放GIL的模块(例如numpy),使用线程后端将通过减少从主进程到子进程的Np对象复制成本来节省时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM