简体   繁体   中英

How is python process/thread map working in windows?Why thread works faster than process?

I am trying to find a faster way to run numpy/sklearn to do some task on Lists of Data. I got some books which suggest me to use Process rather than Thread in Heavy data computing jobs. While doing this I find that threads run faster than Process. Why is that? Which way should I choose?

# -*- coding: utf-8 -*-
"""
Created on Tue Apr  2 10:20:19 2019

@author: Simon
"""
import time
import numpy as np

from sklearn import linear_model
from concurrent.futures import ProcessPoolExecutor as Pool
from concurrent.futures import ThreadPoolExecutor as Pool

xx, yy = np.meshgrid(np.linspace(0,10,1000), np.linspace(10,100,1000))
zz = 1.0 * xx + 3.5 * yy + np.random.randint(0,100,(1000,1000))

X, Z = np.column_stack((xx.flatten(),yy.flatten())), zz.flatten()


regr = linear_model.LinearRegression()


def regwork(t):
    X=t[0]
    Z=t[1]
    regr.fit(X, Z)
    a, b = regr.coef_, regr.intercept_
    return a

def numpywork(t):
    X=t[0]
    Z=t[1]
    for i in range(1):
        r=np.sum(X,axis=1)+np.log(Z)
    return np.sum(r)

if __name__=="__main__":
    r=regx((X,Z))
    rlist=[[X,Z]]*500



    start=time.clock()
    pool = Pool(max_workers=2)
    results = pool.map(numpywork, rlist)

    for ret in results:
        print(ret)
    print(time.clock()-start)

Run on Win7-4 Real Core-I5-4700 with python 3.6. Here is the output:

Ways|Workerjob|Process Num showed in taskmgr|Cpu loads while working|Time cost

2threads|numpy |1 process|100%|9s

2threads|sklearn|1 process|100%|35s

2process|numpy |3 process|100%|36s

2process|sklearn|3 process|100%|77s

Why process cost more time? How to find a better way to lower the time cost and make full use of the multi-core OS?

OK. I have got it. For those modules that could release GIL like numpy, Using Thread backend will save time by reducing the Np object copy cost from main process to sub-process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM