简体   繁体   English

带有多个参数的python多重处理

[英]python multiprocessing with multiple arguments

I'm trying to multiprocess a function that does multiple actions for a large file but I'm getting the knownle pickling error eventhough Im using partial . 我正在尝试对一个大文件执行多个操作的函数的多进程处理,但是尽管Im使用partial却遇到了已知的pickling错误。

The function looks something like this: 该函数如下所示:

def process(r,intermediate_file,record_dict,record_id):

    res=0

    record_str = str(record_dict[record_id]).upper()
    start = record_str[0:100]
    end= record_str[len(record_seq)-100:len(record_seq)]

    print sample, record_id
    if r=="1":

        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")



    if r == "2":
        if something:
            res = something...
            intermediate_file.write("...")

        if something:
            res = something
            intermediate_file.write("...")

    return res

The way im calling it is the following in another function: 另一个函数中的即时通讯方式如下:

def call_func():
    intermediate_file = open("inter.txt","w")
    record_dict = get_record_dict()                 ### get infos about each record as a dict based on the record_id
    results_dict = {}  
    pool = Pool(10)
    for a in ["a","b","c",...]:

        if not results_dict.has_key(a):
            results_dict[a] = {}

        for b in ["1","2","3",...]:

            if not results_dict[a].has_key(b):
                results_dict[a][b] = {}


            results_dict[a][b]['res'] = []

            infile = open(a+b+".txt","r")
            ...parse the file and return values in a list called "record_ids"...

            ### now call the function based on for each record_id in record_ids
            if b=="1":
                func = partial(process,"1",intermediate_file,record_dict)
                res=pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict 
                results_dict[a][b]['res'].append(res)

            if b=="2":
                func = partial(process,"2",intermediate_file,record_dict)
                res = pool.map(func, record_ids)
                ## append the results for each pair (a,b) for EACH RECORD in the results_dict
                results_dict[a][b]['res'].append(res) 

    ... do something with results_dict...

The idea is that for each record inside the record_ids, I want to save the results for each pair (a,b). 这个想法是,对于record_ids中的每个记录,我想保存每对(a,b)的结果。

I'm not sure what is giving me this error: 我不确定是什么给我这个错误:

  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile

d d

func is not defined at the top level of the code so it can't be pickled. func不在代码的顶层定义,因此不能被腌制。 You can use pathos.multiprocesssing which is not a standard module but it will work. 您可以使用pathos.multiprocesssing ,它不是标准模块,但可以使用。

Or, use something diferent to Pool.map maybe a Queue of workers ? 或者,使用与Pool.map不同的东西,也许是一个工作队列? https://docs.python.org/2/library/queue.html https://docs.python.org/2/library/queue.html

In the end there is an example you can use, it's for threading but is very similar to the multiprocessing where there is also Queues... 最后有一个您可以使用的示例,它用于threading multiprocessing ,但与multiprocessing非常相似,那里也有队列...

https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM