简体   繁体   English

使用Parallel Python记录工作进程

[英]Logging worker processes with Parallel Python

I've inherited the maintenance of some scientific computing using Parallel Python on a cluster. 我继承了在集群上使用Parallel Python维护一些科学计算。 With Parallel Python, jobs are submitted to a ppserver, which (in this case) talks to already-running ppserver processes on other computers, dishing tasks out to ppworkers processes. 使用Parallel Python,作业将被提交给ppserver,在这种情况下,它会与其他计算机上已经运行的ppserver进程进行通信,将任务输出到ppworkers进程。

I'd like to use the standard library logging module to log errors and debugging information from the functions that get submitted to a ppserver. 我想使用标准库日志记录模块来记录提交给ppserver的函数中的错误和调试信息。 Since these ppworkers run as separate processes (on separate computers) I'm not sure how to properly structure the logging. 由于这些ppworkers作为单独的进程运行(在不同的计算机上),我不确定如何正确构建日志记录。 Must I log to a separate file for each process? 我必须为每个进程登录一个单独的文件吗? Maybe there's a log handler that would make it all better? 也许有一个日志处理程序可以让它变得更好?

Also, I want reports on what process on what computer has hit an error, but the code I'm writing the logging in probably isn't aware of these things; 另外,我想要报告什么计算机遇到错误的过程,但我正在编写登录的代码可能不知道这些事情; maybe that should be happening at the ppserver level? 也许这应该发生在ppserver级别?

(Version of the question cross-posted on Parallel Python Forums, I'll post an answer here if I get something there about this from a non SO user) (在并行Python论坛上交叉发布的问题的版本,如果我从非SO用户那里得到关于此的内容,我会在这里发布答案)

One way to solve your problem is to do the following: 解决问题的一种方法是执行以下操作:

  1. In each worker process, use a logging.handlers.SocketHandler to send events from the worker to a dedicated logger process. 在每个工作进程中,使用logging.handlers.SocketHandler将事件从工作程序发送到专用的记录器进程。
  2. Create a dedicated logger process which listens for logging events on a socket, based on the working example given in the docs at https://docs.python.org/3/howto/logging-cookbook.html#sending-and-receiving-logging-events-across-a-network 根据https://docs.python.org/3/howto/logging-cookbook.html#sending-and-receiving-中文档中给出的工作示例,创建一个专用的记录器进程,用于侦听套接字上的记录事件。 记录的事件-跨一个网络
  3. Profit ;-) 利润;-)

If you catch exceptions in your worker functions and log them, then you should be able to get visibility of errors across all workers in one place. 如果您在工作者函数中捕获异常并记录它们,那么您应该能够在一个位置查看所有工作程序中的错误。

I'd use Python's logging and socket APIs. 我使用Python的loggingsocket API。 Just follow the example here . 请按照此处的示例操作。

Simply start a ppworker dedicated to logging somewhere, and create a new logging.Logger in each of the other workers with a logging.SocketHandler specifying the hostname and port of the machine running the logging ppworker . 只需启动一个ppworker专用于记录的地方,并创建一个新的logging.Logger在其他每个工人用logging.SocketHandler指定运行记录的机器的主机名和端口ppworker

If you have a syslog server running, you can also use Python's syslog module, which is documented here . 如果您正在运行syslog服务器,您还可以使用Python的syslog模块, 此处将对此进行说明。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM