I can't get logging to a single file working with multprocess.Pool.apply_async. I'm trying to adapt this example from the Logging Cookbook, but it only works for multiprocessing.Process
. Passing the logging queue into apply_async
doesn't seem to have effect. I would like to use a Pool so that I can easily manage the number of simultaneous threads.
The following adapted example with multiprocessing.Process works ok for me, except I am not getting log messages from the main process, and I don't think it will work well when I have 100 large jobs.
import logging
import logging.handlers
import numpy as np
import time
import multiprocessing
import pandas as pd
log_file = 'PATH_TO_FILE/log_file.log'
def listener_configurer():
root = logging.getLogger()
h = logging.FileHandler(log_file)
f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s')
h.setFormatter(f)
root.addHandler(h)
# This is the listener process top-level loop: wait for logging events
# (LogRecords)on the queue and handle them, quit when you get a None for a
# LogRecord.
def listener_process(queue, configurer):
configurer()
while True:
try:
record = queue.get()
if record is None: # We send this as a sentinel to tell the listener to quit.
break
logger = logging.getLogger(record.name)
logger.handle(record) # No level or filter logic applied - just do it!
except Exception:
import sys, traceback
print('Whoops! Problem:', file=sys.stderr)
traceback.print_exc(file=sys.stderr)
def worker_configurer(queue):
h = logging.handlers.QueueHandler(queue) # Just the one handler needed
root = logging.getLogger()
root.addHandler(h)
# send all messages, for demo; no other level or filter logic applied.
root.setLevel(logging.DEBUG)
# This is the worker process top-level loop, which just logs ten events with
# random intervening delays before terminating.
# The print messages are just so you know it's doing something!
def worker_function(sleep_time, name, queue, configurer):
configurer(queue)
start_message = 'Worker {} started and will now sleep for {}s'.format(name, sleep_time)
logging.info(start_message)
time.sleep(sleep_time)
success_message = 'Worker {} has finished sleeping for {}s'.format(name, sleep_time)
logging.info(success_message)
def main_with_process():
start_time = time.time()
single_thread_time = 0.
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process,
args=(queue, listener_configurer))
listener.start()
workers = []
for i in range(10):
name = str(i)
sleep_time = np.random.randint(10) / 2
single_thread_time += sleep_time
worker = multiprocessing.Process(target=worker_function,
args=(sleep_time, name, queue, worker_configurer))
workers.append(worker)
worker.start()
for w in workers:
w.join()
queue.put_nowait(None)
listener.join()
end_time = time.time()
final_message = "Script execution time was {}s, but single-thread time was {}s".format(
(end_time - start_time),
single_thread_time
)
print(final_message)
if __name__ == "__main__":
main_with_process()
But I can't get the following adaptation to work:
def main_with_pool():
start_time = time.time()
queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process,
args=(queue, listener_configurer))
listener.start()
pool = multiprocessing.Pool(processes=3)
job_list = [np.random.randint(10) / 2 for i in range(10)]
single_thread_time = np.sum(job_list)
for i, sleep_time in enumerate(job_list):
name = str(i)
pool.apply_async(worker_function,
args=(sleep_time, name, queue, worker_configurer))
queue.put_nowait(None)
listener.join()
end_time = time.time()
print("Script execution time was {}s, but single-thread time was {}s".format(
(end_time - start_time),
single_thread_time
))
if __name__ == "__main__":
main_with_pool()
I've tried many slight variations, using multiprocessing.Manager, multiprocessing.Queue, multiprocessing.get_logger, apply_async.get(), but haven't gotten any to work.
I would think there would be an off-the-shelf solution for this. Should I try Celery instead?
thanks
Consider using two queues. The first queue is where you put the data for the workers. Each worker after job completion pushes the results to the second queue. Now consume this second queue to write the log to the file.
There are actually two separate problems here, which are intertwined:
multiprocessing.Queue()
object as an argument to a Pool-based function (you can pass it to the worker you start directly, but not any "further in" as it were). None
through to your listener process. To fix the first one, replace:
queue = multiprocessing.Queue(-1)
with:
queue = multiprocessing.Manager().Queue(-1)
as a manager-managed Queue()
instance can be passed through.
To fix the second, either collect each result from each asynchronous call, or close the pool and wait for it, eg:
pool.close()
pool.join()
queue.put_nowait(None)
or the more complex:
getters = []
for i, sleep_time in enumerate(job_list):
name = str(i)
getters.append(
pool.apply_async(worker_function,
args=(sleep_time, name, queue, worker_configurer))
)
while len(getters):
getters.pop().get()
# optionally, close and join pool here (generally a good idea anyway)
queue.put_nowait(None)
(You should also consider replacing your put_nowait
with a waiting version of put
and not using unlimited length queues.)
[ADDENDUM] Regarding maxtasksperchild=1
you don't really need it. The reason for repeated messages were due to: you were repeatedly adding queuehandlers
to the root logger of a child process. The following code checks if any handlers exist before adding another:
def worker_configurer(queue):
root = logging.getLogger()
# print(f'{root.handlers=}')
if len(root.handlers) == 0:
h = logging.handlers.QueueHandler(queue)
root.addHandler(h)
root.setLevel(logging.DEBUG)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.