简体   繁体   中英

python multiprocessing pool and logging

My application uses multiprocessing.pool to parallelize the calculations. Now I would like to add logging feature. The code (unfortunately) needs to run on Windows. I found one related post on stackoverflow , but it not working. I suppose the package multiprocessing_logging doesn't support pools.

Here is my code:

from multiprocessing_logging import install_mp_handler

def main(): # main function
    filename = "XXX" + datetime.datetime.now().strftime('%Y-%m-%d-%H.%M.%S') + ".log"

    log_file = os.path.abspath(os.path.join('logs',filename))
    multiprocessing.freeze_support() # support multiprocessing

    logging.basicConfig(filename=log_file,
                        filemode='a',
                        format='%(asctime)s:%(msecs)d (%(processName)s) %(levelname)s %(name)s \t %(message)s',
                        datefmt='%H:%M:%S',
                        level=logging.DEBUG)

    logger.info("Start application")

def run(): # main exection
    logger.info("Generate outputs for every metrics")
    num_cores = multiprocessing.cpu_count()
    logger.info("Output Generation execute on " + str(num_cores) + " cores" )

    pool = Pool(num_cores, initializer=install_mp_handler )
    processed_metrics = pool.map(_generate_outputs, metrics_list)
    pool.close()
    pool.join()
    map(_create_report,processed_metrics)

The implementation of helper function _generate_outputs and _create_report are irrelevant to the problem. When I execute the code the logs generated by the modules from the main process are correctly stored, but not from the child processes.

[EDIT]
I changed my code according to the comments. Now, my code looks like this:

    num_cores = multiprocessing.cpu_count()
    logger.info("Output Generation execute on " + str(num_cores) + " cores" )
    install_mp_handler()
    pool = Pool(num_cores, initializer=install_mp_handler )
    processed_metrics = pool.map(_generate_outputs, metrics_list)
    pool.close()
    pool.join()
    map(_create_report,processed_metrics)

But still, logs from child processes are not captured. After program termination I see an error:

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\multiprocessing_logging.py", line 64, in _receive
    record = self.queue.get(timeout=0.2)
  File "C:\Python27\lib\multiprocessing\queues.py", line 131, in get
    if not self._poll(timeout):
IOError: [Errno 109] The pipe has been ended
Exception in thread mp-handler-0:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\lib\site-packages\multiprocessing_logging.py", line 62, in _receive
    while not (self._is_closed and self.queue.empty()):
  File "C:\Python27\lib\multiprocessing\queues.py", line 146, in empty
    return not self._poll()
IOError: [Errno 109] The pipe has been ended

The key requirements are the program needs to work on Windows.

You need to call the install_mp_handler() before the Pool() instantiation.

...
install_mp_handler()
pool = Pool(num_cores, initializer=install_mp_handler)
...

In the end it all comes down to the logging being transferred over a queue to a centralized log handler, take a look into the multiprocessing_logging.py it gives a clear understanding of the technique.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM