[英]python multiprocessing with multiple arguments
I'm trying to multiprocess a function that does multiple actions for a large file but I'm getting the knownle pickling
error eventhough Im using partial
. 我正在尝试对一个大文件执行多个操作的函数的多进程处理,但是尽管Im使用
partial
却遇到了已知的pickling
错误。
The function looks something like this: 该函数如下所示:
def process(r,intermediate_file,record_dict,record_id):
res=0
record_str = str(record_dict[record_id]).upper()
start = record_str[0:100]
end= record_str[len(record_seq)-100:len(record_seq)]
print sample, record_id
if r=="1":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
if r == "2":
if something:
res = something...
intermediate_file.write("...")
if something:
res = something
intermediate_file.write("...")
return res
The way im calling it is the following in another function: 另一个函数中的即时通讯方式如下:
def call_func():
intermediate_file = open("inter.txt","w")
record_dict = get_record_dict() ### get infos about each record as a dict based on the record_id
results_dict = {}
pool = Pool(10)
for a in ["a","b","c",...]:
if not results_dict.has_key(a):
results_dict[a] = {}
for b in ["1","2","3",...]:
if not results_dict[a].has_key(b):
results_dict[a][b] = {}
results_dict[a][b]['res'] = []
infile = open(a+b+".txt","r")
...parse the file and return values in a list called "record_ids"...
### now call the function based on for each record_id in record_ids
if b=="1":
func = partial(process,"1",intermediate_file,record_dict)
res=pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
if b=="2":
func = partial(process,"2",intermediate_file,record_dict)
res = pool.map(func, record_ids)
## append the results for each pair (a,b) for EACH RECORD in the results_dict
results_dict[a][b]['res'].append(res)
... do something with results_dict...
The idea is that for each record inside the record_ids, I want to save the results for each pair (a,b). 这个想法是,对于record_ids中的每个记录,我想保存每对(a,b)的结果。
I'm not sure what is giving me this error: 我不确定是什么给我这个错误:
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/code/Python/Python-2.7.9/Lib/multiprocessing/pool.py", line 558, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function faile
d d
func
is not defined at the top level of the code so it can't be pickled. func
不在代码的顶层定义,因此不能被腌制。 You can use pathos.multiprocesssing
which is not a standard module but it will work. 您可以使用
pathos.multiprocesssing
,它不是标准模块,但可以使用。
Or, use something diferent to Pool.map
maybe a Queue of workers ? 或者,使用与
Pool.map
不同的东西,也许是一个工作队列? https://docs.python.org/2/library/queue.html https://docs.python.org/2/library/queue.html
In the end there is an example you can use, it's for threading
but is very similar to the multiprocessing
where there is also Queues... 最后有一个您可以使用的示例,它用于
threading
multiprocessing
,但与multiprocessing
非常相似,那里也有队列...
https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues https://docs.python.org/2/library/multiprocessing.html#pipes-and-queues
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.