简体   繁体   English

Python:multiprocessing.map失败,出现queue.FULL

[英]Python: multiprocessing.map fails with queue.FULL

I am using the map() function from 我正在使用map()函数

 from concurrent.futures import ProcessPoolExecutor

in order to do a simple data parallelization. 为了做一个简单的数据并行化。

I want to process 400 files, using map() to call a processing function on them. 我想处理400个文件,使用map()对其调用处理函数。

  infiles = glob.glob(os.path.join(input_path, '**/*.xls'), recursive=True) + glob.glob(os.path.join(input_path, '**/*.xlsx'), recursive=True) 
  outfiles = [os.path.join(os.path.dirname(infile), os.path.basename(infile).split('.')[0]+'.csv') for infile in infiles]

  with ProcessPoolExecutor(max_workers=None) as executor:
      executor.map(excel2csv, infiles, outfiles)

so excel2csv() should be called for each file, passing its desired input and output path. 因此应为每个文件调用excel2csv() ,并传递其所需的输入和输出路径。 It will process each file independently, writing results to disc, and returns nothing. 它将独立处理每个文件,将结果写入光盘,并且不返回任何内容。

After about 100 files, the application throws an Exception, complaining about a full Queue. 大约100个文件后,应用程序将引发异常,抱怨队列已满。

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 295, in _queue_management_worker
    shutdown_worker()
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 253, in shutdown_worker
    call_queue.put_nowait(None)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 129, in put_nowait
    return self.put(obj, False)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 83, in put
    raise Full
queue.Full

The most similar problem I found is discussed here . 我发现的最相似的问题在这里讨论。

But in my case the data passed to the worker function is minimal (containing two strings). 但就我而言,传递给worker函数的数据最少(包含两个字符串)。 Checking the default queue size (from _multiprocessing.SemLock.SEM_VALUE_MAX) which is fare bigger than 400. 检查默认队列大小(来自_multiprocessing.SemLock.SEM_VALUE_MAX),其大小大于400。

Any ideas? 有任何想法吗? Thank you 谢谢

I found the error to be caused by an exceptions produced in the worker function being called by executor.map(). 我发现错误是由executor.map()调用的worker函数中产生的异常引起的。

It seems that Exceptions are consumed? 看来异常被消耗了吗? by executor.map() and I guess this has filled up the Queue somehow. 通过executor.map(),我想这已经以某种方式填满了队列。

My solution is to handle the issue in excel2csv() and include a generic try catch exception handling that will not cause the Queue to fill up. 我的解决方案是在excel2csv()中处理该问题,并包括一个通用的try catch异常处理,该处理不会导致Queue满。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM