I am using the map()
function from
from concurrent.futures import ProcessPoolExecutor
in order to do a simple data parallelization.
I want to process 400 files, using map()
to call a processing function on them.
infiles = glob.glob(os.path.join(input_path, '**/*.xls'), recursive=True) + glob.glob(os.path.join(input_path, '**/*.xlsx'), recursive=True)
outfiles = [os.path.join(os.path.dirname(infile), os.path.basename(infile).split('.')[0]+'.csv') for infile in infiles]
with ProcessPoolExecutor(max_workers=None) as executor:
executor.map(excel2csv, infiles, outfiles)
so excel2csv()
should be called for each file, passing its desired input and output path. It will process each file independently, writing results to disc, and returns nothing.
After about 100 files, the application throws an Exception, complaining about a full Queue.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 295, in _queue_management_worker
shutdown_worker()
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/concurrent/futures/process.py", line 253, in shutdown_worker
call_queue.put_nowait(None)
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 129, in put_nowait
return self.put(obj, False)
File "/home/mapa17/miniconda3/envs/pygng/lib/python3.5/multiprocessing/queues.py", line 83, in put
raise Full
queue.Full
The most similar problem I found is discussed here .
But in my case the data passed to the worker function is minimal (containing two strings). Checking the default queue size (from _multiprocessing.SemLock.SEM_VALUE_MAX) which is fare bigger than 400.
Any ideas? Thank you
I found the error to be caused by an exceptions produced in the worker function being called by executor.map().
It seems that Exceptions are consumed? by executor.map() and I guess this has filled up the Queue somehow.
My solution is to handle the issue in excel2csv() and include a generic try catch exception handling that will not cause the Queue to fill up.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.