简体   繁体   中英

Python Multiprocess/Threading loop.

What i am trying to do is to check which multiprocessing is best for my data. I tried to multiprocess this loop:

def __pure_calc(args):

    j = args[0]
    point_array = args[1]
    empty = args[2]
    tree = args[3] 

    for i in j:
            p = tree.query(i)   

            euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2))  

            ##add one row at a time to empty list
            empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]]) 

    return empty

Just pure function is taking 6.52 sec.

My first approach was multiprocessing.map:

from multiprocessing import Pool 

def __multiprocess(las_point_array, point_array, empty, tree):

    pool = Pool(os.cpu_count()) 

    for j in las_point_array:
        args=[j, point_array, empty, tree]
        results = pool.map(__pure_calc, args)

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results

When i checked other answers how to multiprocess function it should be easy as that: map(call function, inputs) - done. But for some reason my multiproccess is not excepting my inputs, rising error that scipy.spatial.ckdtree.cKDTree object is not subscriptable.

So i tried with apply_async:

from multiprocessing.pool import ThreadPool

def __multiprocess(arSegment, wires_point_array, ptList, tree):

    pool = ThreadPool(os.cpu_count())

    args=[arSegment, point_array, empty, tree]

    result = pool.apply_async(__pure_calc, [args])

    results = result.get()

It run with out problems. For my test data i manage to calculate it in 6.42 sec.

Why apply_async is accepting ckdtree with out any problem and pool.map not? What i need to change to make it running?

pool.map(function, iterable) , it basically has the same footprint with itertool's map . Each item from the iterable will be the args for your __pure_calc function.

In this case I guess you might change into this:

def __multiprocess(las_point_array, point_array, empty, tree):

    pool = Pool(os.cpu_count()) 

    args_list = [
        [j, point_array, empty, tree]
        for j in las_point_array
    ]

    results = pool.map(__pure_calc, args_list)

    #close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 

    return results

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM