[英]Multiprocessing with Return values
我有一个问题,将多处理以加快对存储在 S3 上需要检查的文件的一些处理。 因为我现在是使用多处理的新手,所以我不确定当我只使用 for 循环时,代码运行时没有发布到底有什么问题。
def read_json(file):
file_key = file["Key"]
file_key_split = file_key.split("/")
document = get_json_details(file_key)
type = file_key_split[2]
return document, type
document_list = []
document_type_list = []
mgr = mp.Manager()
nodes = mgr.list()
pool_size = mp.cpu_count()
pool = mp.Pool(processes=pool_size)
# mp.freeze_support()
for file in tqdm(get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)):
document_list, document_type_list = zip(*pool.map(read_json, file))
pool.close()
pool.join()
我得到的错误如下:
"""
Traceback (most recent call last):
File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "c:\GIT\BMWJPSI-BI\03_Lambda_Functions\RegoOCRCheck.py", line 118, in read_json
file_key = file["Key"]
TypeError: string indices must be integers
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:/GIT/BMWJPSI-BI/03_Lambda_Functions/RegoOCRCheck.py", line 151, in <module>
document_list, document_type_list = zip(pool.map(read_json, file))
File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 771, in get
raise self._value
TypeError: string indices must be integers```
Thanks for your help.
Sorry for the delayed response, I think the issue you're having is that you're passing a dictionary object into the pool.map
function, which will only iterating through the keys of the dictionary instead of passing the dictionary object itself. I think instead of iterating through each individual file
and run the pool.map
, you should try passing the entire get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)
into the pool.map
function which will be iterated and return as list of tuples其中每个元组是(document_list, document_type_list)
document_list, document_type_list = zip(*pool.map(read_json, get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)))
如果您仍然遇到任何问题,请告诉我
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.