简体   繁体   English

如何通过 pool.imap() function 传递数据以外的参数以在 python 中进行多处理?

[英]how to pass parameters other than data through pool.imap() function for multiprocessing in python?

I am working on hyperspectral images.我正在研究高光谱图像。 To reduce the noise from the image I am using wavelet transformation using pywt package.为了减少图像中的噪声,我使用 pywt package 进行小波变换。 When I am doing this normally(serial processing) it's working smoothly.当我正常执行此操作(串行处理)时,它工作顺利。 But when I am trying to implement parallel processing using multiple cores for wavelet transformation on the image, then I have to pass certain parameters like但是当我尝试使用多核对图像进行小波变换来实现并行处理时,我必须传递某些参数,例如

  1. wavelet family小波族
  2. thresholding value阈值
  3. threshold technique (hard/soft)阈值技术(硬/软)

But I am not able to pass these parameters using the pool object, I can pass only data as an argument when I am using the pool.imap().但是我无法使用池 object 传递这些参数,当我使用 pool.imap() 时,我只能将数据作为参数传递。 But when I am using the pool.apply_async() it's taking much more time and also the order of the output is not the same.但是当我使用 pool.apply_async() 时,它需要更多的时间,而且 output 的顺序也不一样。 Here I am adding the code for reference:我在这里添加代码以供参考:

import matplotlib.pyplot as plt
import numpy as np
import multiprocessing as mp
import os
import time
from math import log10, sqrt
import pywt
import tifffile

def spec_trans(d,wav_fam,threshold_val,thresh_type):
  
  data=np.array(d,dtype=np.float64)
  data_dec=decomposition(data,wav_fam)
  data_t=thresholding(data_dec,threshold_val,thresh_type)
  data_rec=reconstruction(data_t,wav_fam)
  
  return data_rec

if __name__ == '__main__':
    
    #input
    X=tifffile.imread('data/Classification/university.tif')
    #take paramaters
    threshold_val=float(input("Enter the value for image thresholding: "))
    print("The available wavelet functions:",pywt.wavelist())
    wav_fam=input("Choose a wavelet function for transformation: ")
    threshold_type=['hard','soft']
    print("The available wavelet functions:",threshold_type)
    thresh_type=input("Choose a type for threshholding technique: ")

    start=time.time()
    p = mp.Pool(4)
    jobs=[]
    for dataBand in xmp:
      jobs.append(p.apply_async(spec_trans,args=(dataBand,wav_fam,threshold_val,thresh_type)))
    transformedX=[]
    for jobBit in jobs:
      transformedX.append(jobBit.get())
    end=time.time()
    p.close()


Also when I am using the 'soft' technique for thresholding I am facing the following error:此外,当我使用“软”技术进行阈值处理时,我面临以下错误:

C:\Users\Sawon\anaconda3\lib\site-packages\pywt\_thresholding.py:25: RuntimeWarning: invalid value encountered in multiply
  thresholded = data * thresholded

The results of serial execution and parallel execution would be more or less the same.串行执行和并行执行的结果或多或少是相同的。 But here I am getting slightly different results.但在这里我得到的结果略有不同。 Any suggestion to modify the code will be helpful Thank任何修改代码的建议都会有所帮助谢谢

[This isn't a direct answer to the question, but a clearer follow up query than trying the below via the small comment box] [这不是问题的直接答案,而是比通过小评论框尝试以下更清晰的后续查询]

As a quick check, pass in an iterator counter to spec_trans and return it back out (as well as your result) - and push it into a separate list, transformedXseq or something - and then compare to your input sequence.作为快速检查,将迭代器计数器传递给 spec_trans 并将其返回(以及您的结果) - 并将其推送到单独的列表中,transformXseq 或其他东西 - 然后与您的输入序列进行比较。 ie IE

def spec_trans(d,wav_fam,threshold_val,thresh_type, iCount):

    data=np.array(d,dtype=np.float64)
    data_dec=decomposition(data,wav_fam)
    data_t=thresholding(data_dec,threshold_val,thresh_type)
    data_rec=reconstruction(data_t,wav_fam)

    return data_rec, iCount

and then within main然后在 main

jobs=[]
iJobs = 0
for dataBand in xmp:
   jobs.append(p.apply_async(spec_trans,args=(dataBand,wav_fam,threshold_val,thresh_type, iJobs)))
   iJobs = iJobs + 1 

transformedX=[]
transformedXseq=[]
for jobBit in jobs:
    res = jobBit.get()
    transformedX.append(res[0])
    transformedXseq.append(res[1])

... and check the list transformedXseq to see if you've gathered the jobs back up in the sequence you submitted them. ...并检查列表 transformXseq 以查看您是否按照提交的顺序收集了作业。 It should match!应该匹配!

Assuming wav_fam , threshold_val and thresh_type do not vary from call to call, first arrange for these arguments to be the first arguments to worker function spec_trans :假设wav_famthreshold_valthresh_type不会因调用而异,首先将这些 arguments 安排为第一个arguments 到工人spec_trans

def spec_trans(wav_fam, threshold_val, thresh_type, d):

Now I don't see where in your pool-creation block you have defined xmp , but presumably this is an iterable.现在我看不到您在池创建块中的哪个位置定义xmp ,但大概这是一个可迭代的。 You need to modify this code as follows:您需要按如下方式修改此代码:

from functools import partial

def compute_chunksize(pool_size, iterable_size):
    chunksize, remainder = divmod(iterable_size, 4 * pool_size)
    if remainder:
        chunksize += 1
    return chunksize

if __name__ == '__main__':

    X=tifffile.imread('data/Classification/university.tif')
    #take paramaters
    threshold_val=float(input("Enter the value for image thresholding: "))
    print("The available wavelet functions:",pywt.wavelist())
    wav_fam=input("Choose a wavelet function for transformation: ")
    threshold_type=['hard','soft']
    print("The available wavelet functions:",threshold_type)
    thresh_type=input("Choose a type for threshholding technique: ")

    start=time.time()
    p = mp.Pool(4)
    # first 3 arguments to spec_trans will be wav_fam, threshold_val and thresh_type 
    worker = partial(spec_trans, wav_fam, threshold_val, thresh_type)
    suitable_chunksize = compute_chunksize(4, len(xmp))
    transformedX = list(p.imap(worker, xmp, chunksize=suitable_chunksize))
    end=time.time()

To obtain improved performance over using apply_async , you must use a "suitable chunksize" value with imap .要获得比使用apply_async更高的性能,您必须使用imap的“合适的块大小”值。 Function compute_chunksize can be used for computing such a value based on the size of your pool, which is 4, and the size of the iterable being passed to imap , which would be len(xmp) . Function compute_chunksize可用于根据池的大小(即 4)和传递给imap的可迭代的大小(即len(xmp)来计算这样的值。 If the size of xmp is small enough such that the chunksize value computed is 1, I don't really see how imap would be significantly more performant over apply_async .如果xmp的大小足够小,以至于计算出的 chunksize 值为 1,我真的看不出imap会比apply_async性能显着提高。

Of course, you might as well just use:当然,您也可以只使用:

    transformedX = p.map(worker, xmp)

And let the pool compute its own suitable chunksize.并让池计算自己合适的块大小。 imap has an advantage over map when the iterable is very large and not already a list.当可迭代对象非常大并且还不是列表时, imapmap具有优势。 For map to compute a suitable chunksize it would first have to convert the iterable to a list just to get its length and this could be memory inefficient.对于map来计算合适的块大小,它首先必须将可迭代对象转换为列表以获取其长度,这可能是 memory 效率低下。 But if you know the length (or approximate length) of the iterable, then by using imap you can explicitly set a chunksize without having to convert the iterable to a list.但是如果您知道可迭代对象的长度(或近似长度),那么通过使用 imap 您可以显式设置块大小,而无需将可迭代对象转换为列表。 The other advantage of imap_unordered over map is that you can process the results for the individual tasks as they become available whereas with map you only get results when all the submitted tasks are complete.imap_unordered相比, map的另一个优势是,您可以在单个任务可用时处理它们的结果,而使用map ,您只有在所有提交的任务完成后才能获得结果。

Update更新

If you want to catch possible exceptions thrown by individual tasks submitted to your worker function, then stick with using imap , and use the following code to iterate the results returned by imap :如果您想捕获提交给您的工作人员 function 的单个任务可能引发的异常,请坚持使用imap ,并使用以下代码迭代imap返回的结果:

    #transformedX = list(p.imap(worker, xmp, chunksize=suitable_chunksize))
    transformedX = []
    results = p.imap(worker, xmp, chunksize=suitable_chunksize)
    import traceback
    while True:
        try:
            result = next(results)
        except StopIteration: # no more results
            break
        except Exception as e:
            print('Exception occurred:', e)
            traceback.print_exc() # print stacktrace
        else:
            transformedX.append(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 多处理 Pool.imap 坏了? - multiprocessing Pool.imap broken? Python 多处理 - 我们可以将 (itertools.islice) 可迭代对象直接传递给 pool.imap 而无需转换为列表吗? - Python Multiprocessing - Can we pass an (itertools.islice) iterable directly to pool.imap whithout converting to a list? 我可以在 Pool.imap 调用的 function 中使用多处理队列吗? - Can I use a multiprocessing Queue in a function called by Pool.imap? Python 中的多处理:有没有办法在不累积内存的情况下使用 pool.imap? - Multiprocessing in Python: Is there a way to use pool.imap without accumulating memory? 将多个 arguments(数据除外)传递给 imap function 用于 Python 中的多处理 - Passing multiple arguments (other than data) to imap function for multiprocessing in Python Python多重处理Pool.imap引发ValueError:list.remove(x):x不在列表中 - Python multiprocessing Pool.imap throws ValueError: list.remove(x): x not in list 如何将 args 元组传递给多处理池 imap? - How to pass tuple of args to multiprocessing pool imap? 带有Python的多处理池imap的KeyboardInterrupts - KeyboardInterrupts with python's multiprocessing Pool imap Python 将变量传递给多处理池 - Python pass variable to multiprocessing pool Python多处理Pool.map并不比调用函数一次快 - Python multiprocessing Pool.map not faster than calling the function once
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM