简体   繁体   中英

Python: multiprocessing.map fails with queue.FULL

I am using the map() function from

 from concurrent.futures import ProcessPoolExecutor

in order to do a simple data parallelization.

I want to process 400 files, using map() to call a processing function on them.

  infiles = glob.glob(os.path.join(input_path, '**/*.xls'), recursive=True) + glob.glob(os.path.join(input_path, '**/*.xlsx'), recursive=True) 
  outfiles = [os.path.join(os.path.dirname(infile), os.path.basename(infile).split('.')[0]+'.csv') for infile in infiles]

  with ProcessPoolExecutor(max_workers=None) as executor:
      executor.map(excel2csv, infiles, outfiles)

so excel2csv() should be called for each file, passing its desired input and output path. It will process each file independently, writing results to disc, and returns nothing.

After about 100 files, the application throws an Exception, complaining about a full Queue.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 295, in _queue_management_worker
    shutdown_worker()
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 253, in shutdown_worker
    call_queue.put_nowait(None)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 129, in put_nowait
    return self.put(obj, False)
  File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 83, in put
    raise Full
queue.Full

The most similar problem I found is discussed here .

But in my case the data passed to the worker function is minimal (containing two strings). Checking the default queue size (from _multiprocessing.SemLock.SEM_VALUE_MAX) which is fare bigger than 400.

Any ideas? Thank you

I found the error to be caused by an exceptions produced in the worker function being called by executor.map().

It seems that Exceptions are consumed? by executor.map() and I guess this has filled up the Queue somehow.

My solution is to handle the issue in excel2csv() and include a generic try catch exception handling that will not cause the Queue to fill up.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM