简体   繁体   English

python 日志记录是否支持多处理?

[英]Does python logging support multiprocessing?

I have been told that logging can not be used in Multiprocessing.有人告诉我日志不能在多处理中使用。 You have to do the concurrency control in case multiprocessing messes the log.您必须进行并发控制,以防多处理弄乱日志。

But I did some test, it seems like there is no problem using logging in multiprocessing但是我做了一些测试,在多处理中使用登录似乎没有问题

import time
import logging
from multiprocessing import Process, current_process, pool


# setup log
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
                    datefmt='%a, %d %b %Y %H:%M:%S',
                    filename='/tmp/test.log',
                    filemode='w')


def func(the_time, logger):
    proc = current_process()
    while True:
        if time.time() >= the_time:
            logger.info('proc name %s id %s' % (proc.name, proc.pid))
            return



if __name__ == '__main__':

    the_time = time.time() + 5

    for x in xrange(1, 10):
        proc = Process(target=func, name=x, args=(the_time, logger))
        proc.start()

As you can see from the code.从代码中可以看出。

I deliberately let the subprocess write log at the same moment( 5s after start) to increase the chance of conflict.我故意让子进程在同一时刻(启动后5s)写入日志以增加冲突的机会。 But there are no conflict at all.但根本没有冲突。

So my question is can we use logging in multiprocessing?所以我的问题是我们可以在多处理中使用日志记录吗? Why so many posts say we can not?为什么这么多帖子说我们不能?

As Matino correctly explained: logging in a multiprocessing setup is not safe, as multiple processes (who do not know anything about the other ones existing) are writing into the same file, potentially intervening with each other.正如 Matino 正确解释的那样:登录多处理设置是不安全的,因为多个进程(对现有的其他进程一无所知)正在写入同一个文件,可能会相互干扰。

Now what happens is that every process holds an open file handle and does an "append write" into that file.现在发生的情况是每个进程都持有一个打开的文件句柄,并在该文件中执行“追加写入”。 The question is under what circumstances the append write is "atomic" (that is, cannot be interrupted by eg another process writing to the same file and intermingling his output).问题是在什么情况下追加写入是“原子的”(也就是说,不能被例如另一个进程写入同一个文件并混合他的输出)。 This problem applies to every programming language, as in the end they'll do a syscall to the kernel.这个问题适用于每种编程语言,因为最终它们会对内核进行系统调用。 This answer answers under which circumstances a shared log file is ok. 这个答案回答了在什么情况下共享日志文件是可以的。

It comes down to checking your pipe buffer size, on linux that is defined in /usr/include/linux/limits.h and is 4096 bytes.它归结为检查您的管道缓冲区大小,在 linux 上定义在/usr/include/linux/limits.h并且是 4096 字节。 For other OSes you find here a good list.对于其他操作系统,您可以在这里找到一个很好的列表。

That means: If your log line is less than 4'096 bytes (if on Linux), then the append is safe, if the disk is directly attached (ie no network in between).这意味着:如果您的日志行少于 4'096 字节(如果在 Linux 上),那么附加是安全的,如果磁盘是直接连接的(即中间没有网络)。 But for more details please check the first link in my answer.但有关更多详细信息,请查看我的答案中的第一个链接。 To test this you can do logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) with different lenghts.要对此进行测试,您可以使用不同的长度执行logger.info('proc name %s id %s %s' % (proc.name, proc.pid, str(proc.name)*5000)) With 5000 for instance I got already mixed up log lines in /tmp/test.log .例如,对于 5000,我已经在/tmp/test.log混淆了日志行。

In this question there are already quite a few solutions to this, so I won't add my own solution here.这个问题中已经有很多解决方案,所以我不会在这里添加我自己的解决方案。

Update: Flask and multiprocessing更新:烧瓶和多处理

Web frameworks like flask will be run in multiple workers if hosted by uwsgi or nginx.如果由 uwsgi 或 nginx 托管,像flask这样的Web框架将在多个worker中运行。 In that case, multiple processes may write into one log file.在这种情况下,多个进程可能会写入一个日志文件。 Will it have problems?会不会有问题?

The error handling in flask is done via stdout/stderr which is then cought by the webserver (uwsgi, nginx, etc.) which needs to take care that logs are written in correct fashion (see eg [this flask+nginx example])( http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/ ), probably also adding process information so you can associate error lines to processes. flask 中的错误处理是通过 stdout/stderr 完成的,然后由网络服务器(uwsgi、nginx 等)执行,需要注意以正确的方式写入日志(参见例如 [this flask+nginx example])( http://flaviusim.com/blog/Deploying-Flask-with-nginx-uWSGI-and-Supervisor/ ),可能还会添加进程信息,以便您可以将错误行与进程相关联。 From flasks doc :来自烧瓶文档

By default as of Flask 0.11, errors are logged to your webserver's log automatically.默认情况下,从 Flask 0.11 开始,错误会自动记录到您的网络服务器日志中。 Warnings however are not.然而警告不是。

So you'd still have this issue of intermingled log files if you use warn and the message exceeds the pipe buffer size.因此,如果您使用warn并且消息超过管道缓冲区大小,您仍然会遇到混合日志文件的问题。

It is not safe to write to a single file from multiple processes.从多个进程写入单个文件是不安全的。

According to https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes根据https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes

Although logging is thread-safe, and logging to a single file from multiple threads in a single process is supported, logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python.尽管日志记录是线程安全的,并且支持从单个进程中的多个线程记录到单个文件,但不支持从多个进程记录到单个文件,因为没有标准方法可以跨多个序列化对单个文件的访问Python 中的进程。

One possible solution would be to have each process write to its own file.一种可能的解决方案是让每个进程写入自己的文件。 You can achieve this by writing your own handler that adds process pid to the end of the file:您可以通过编写自己的处理程序将进程 pid 添加到文件末尾来实现这一点:

import logging.handlers
import os


class PIDFileHandler(logging.handlers.WatchedFileHandler):

    def __init__(self, filename, mode='a', encoding=None, delay=0):
        filename = self._append_pid_to_filename(filename)
        super(PIDFileHandler, self).__init__(filename, mode, encoding, delay)

    def _append_pid_to_filename(self, filename):
        pid = os.getpid()
        path, extension = os.path.splitext(filename)
        return '{0}-{1}{2}'.format(path, pid, extension)

Then you just need to call addHandler :然后你只需要调用addHandler

logger = logging.getLogger('foo')
fh = PIDFileHandler('bar.log')
logger.addHandler(fh)

Use a queue for correct handling of concurrency simultaneously recovering from errors by feeding everything to the parent process via a pipe.使用队列正确处理并发,同时通过管道将所有内容提供给父进程,从而从错误中恢复。

from logging.handlers import RotatingFileHandler
import multiprocessing, threading, logging, sys, traceback

class MultiProcessingLog(logging.Handler):
    def __init__(self, name, mode, maxsize, rotate):
        logging.Handler.__init__(self)

        self._handler = RotatingFileHandler(name, mode, maxsize, rotate)
        self.queue = multiprocessing.Queue(-1)

        t = threading.Thread(target=self.receive)
        t.daemon = True
        t.start()

    def setFormatter(self, fmt):
        logging.Handler.setFormatter(self, fmt)
        self._handler.setFormatter(fmt)

    def receive(self):
        while True:
            try:
                record = self.queue.get()
                self._handler.emit(record)
            except (KeyboardInterrupt, SystemExit):
                raise
            except EOFError:
                break
            except:
                traceback.print_exc(file=sys.stderr)

    def send(self, s):
        self.queue.put_nowait(s)

    def _format_record(self, record):
         # ensure that exc_info and args
         # have been stringified.  Removes any chance of
         # unpickleable things inside and possibly reduces
         # message size sent over the pipe
        if record.args:
            record.msg = record.msg % record.args
            record.args = None
        if record.exc_info:
            dummy = self.format(record)
            record.exc_info = None

        return record

    def emit(self, record):
        try:
            s = self._format_record(record)
            self.send(s)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            self.handleError(record)

    def close(self):
        self._handler.close()
        logging.Handler.close(self)

The handler does all the file writing from the parent process and uses just one thread to receive messages passed from child processes处理程序完成父进程的所有文件写入工作,并仅使用一个线程来接收从子进程传递的消息

QueueHandler is native in Python 3.2+, and safely handles multiprocessing logging. QueueHandler在 Python 3.2+ 中是原生的,并且可以安全地处理多处理日志记录。

Python docs have two complete examples: Logging to a single file from multiple processes Python 文档有两个完整的示例: 从多个进程记录到单个文件

For those using Python < 3.2, just copy QueueHandler into your own code from: https://gist.github.com/vsajip/591589 or alternatively import logutils .对于那些使用 Python < 3.2 的用户,只需将QueueHandler复制到您自己的代码中: https: //gist.github.com/v

Each process (including the parent process) puts its logging on the Queue , and then a listener thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.每个进程(包括父进程)将其日志记录放在Queue上,然后一个listener器线程或进程(为每个进程提供一个示例)拾取这些并将它们全部写入文件 - 没有损坏或乱码的风险。

Note: this question is basically a duplicate of How should I log while using multiprocessing in Python?注意:这个问题基本上与在 Python 中使用多处理时如何记录? so I've copied my answer from that question as I'm pretty sure it's currently the best solution.所以我从那个问题中复制了我的答案,因为我很确定这是目前最好的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM