简体繁体 English

检测某个进程是否已在运行并与之协作

[英]Detect if a process is already running and collaborate with it

原文 2010-01-08 22:12:53 4 3 python/ multiprocessing

I'm trying to create a program that starts a process pool of, say, 5 processes, performs some operation, and then quits, but leaves the 5 processes open. 我正在尝试创建一个程序，该程序启动5个进程的进程池，执行一些操作，然后退出，但使5个进程保持打开状态。 Later the user can run the program again, and instead of it starting new processes it uses the existing 5. Basically it's a producer-consumer model where: 稍后，用户可以再次运行该程序，而不是使用新的程序来启动新的流程，而是使用现有的5。基本上，这是生产者－消费者模型，其中：

The number of producers varies. 生产者的数量各不相同。
The number of consumers is constant. 消费者数量是恒定的。
The producers can be started at different times by different programs or even different users. 生产者可以在不同时间由不同的程序甚至不同的用户启动。

I'm using the builtin multiprocessing module, currently in Python 2.6.4., but with the intent to move to 3.1.1 eventually. 我正在使用内置的multiprocessing模块，当前在Python 2.6.4中使用，但是最终打算迁移到3.1.1。

Here's a basic usage scenario: 这是一个基本的使用场景：

Beginning state - no processes running. 起始状态-没有进程正在运行。
User starts program.py operation - one producer, five consumers running. 用户启动program.py operation -一个生产者，五个消费者运行。
Operation completes - five consumers running. 操作完成-5个使用者在运行。
User starts program.py operation - one producer, five consumers running. 用户启动program.py operation -一个生产者，五个消费者运行。
User starts program.py operation - two producers, five consumers running. 用户启动program.py operation -两个生产者，五个消费者运行。
Operation completes - one producer, five consumers running. 操作完成-一个生产者，五个消费者在运行。
Operation completes - five consumers running. 操作完成-5个使用者在运行。
User starts program.py stop and it completes - no processes running. 用户启动program.py stop并完成-没有进程在运行。
User starts program.py start and it completes - five consumers running. 用户启动program.py start并完成-正在运行五个使用者。
User starts program.py operation - one procucer, five consumers running. 用户启动program.py operation -一个进程，五个使用者运行。
Operation completes - five consumers running. 操作完成-5个使用者在运行。
User starts program.py stop and it completes - no processes running. 用户启动program.py stop并完成-没有进程在运行。

The problem I have is that I don't know where to start on: 我的问题是我不知道从哪里开始：

Detecting that the consumer processes are running. 检测使用者进程正在运行。
Gaining access to them from a previously unrelated program. 从以前不相关的程序访问它们。
Doing 1 and 2 in a cross-platform way. 以跨平台方式执行1和2。

Once I can do that, I know how to manage the processes. 一旦可以做到，我就会知道如何管理流程。 There has to be some reliable way to detect existing processes since I've seen Firefox do this to prevent multiple instances of Firefox from running, but I have no idea how to do that in Python. 必须有某种可靠的方法来检测现有进程，因为我已经看到Firefox这样做是为了防止Firefox的多个实例运行，但是我不知道如何在Python中做到这一点。

3 个解决方案

There are a couple of common ways to do your item #1 (detecting running processes), but to use them would first require that you slightly tweak your mental picture of how these background processes are started by the first invocation of the program. 有两种常见的方法来处理项目＃1（检测正在运行的进程），但要使用它们，首先需要您稍微调整一下有关第一次启动程序时如何启动这些后台进程的思路。

Think of the first program not as starting the five processes and then exiting, but rather as detecting that it is the first instance started and not exiting. 不要将第一个程序视为启动五个过程然后退出，而是将其视为检测到它是第一个实例启动而不是退出。 It can create a file lock (one of the common approaches for preventing multiple occurrences of an application from running), or merely bind to some socket (another common approach). 它可以创建文件锁（一种防止应用程序多次运行的常用方法之一），也可以仅绑定到某个套接字（另一种常用方法）。 Either approach will raise an exception in a second instance, which then knows that it is not the first and can refocus its attention on contacting the first instance. 两种方法都将在第二个实例中引发异常，该实例然后知道它不是第一个实例，并且可以将注意力重新集中在联系第一个实例上。

If you're using multiprocessing , you should be able simply to use the Manager support, which involves binding to a socket to act as a server. 如果使用的是multiprocessing ，则应该可以简单地使用Manager支持，这涉及到绑定到套接字以充当服务器。

The first program starts the processes, creates Queues, proxies, or whatever. 第一个程序启动流程，创建队列，代理或其他内容。 It creates a Manager to allow access to them, possibly allowing remote access . 它创建了一个Manager来允许访问它们，并且可能允许远程访问。

Subsequent invocations first attempt to contact said server/Manager on the predefined socket (or using other techniques to discover the socket it's on). 随后的调用首先尝试在预定义的套接字上联系所述服务器/管理器（或使用其他技术来发现其所在的套接字）。 Instead of doing a server_forever() call they connect() and communicate using the usual multiprocessing mechanisms. 他们没有执行server_forever()调用，而是使用常规的multiprocessing机制进行connect()进行通信。

Take a look at these different Service Discovery mechanisms: http://en.wikipedia.org/wiki/Service_discovery 看一下这些不同的服务发现机制： http : //en.wikipedia.org/wiki/Service_discovery

The basic idea is that the consumers would each register a service when they start. 基本思想是，每个消费者在启动时都将注册一个服务。 The producer would go through the discovery process when starting. 生产者在开始时将经历发现过程。 If it finds the consumers, it binds to them. 如果找到消费者，它将与他们绑定。 If it doesn't find them it starts up new consumers. 如果找不到他们，它将启动新的消费者。 In most all of these systems, services can typically also publish properties, so you can have each consumer uniquely identify itself and give other information to the discovering producer. 在大多数所有这些系统中，服务通常还可以发布属性，因此您可以让每个使用者唯一地标识自己，并将其他信息提供给发现的生产者。

Bonjour/zeroconf is pretty well supported cross-platform. Bonjour / zeroconf是非常受支持的跨平台。 You can even configure Safari to show you the zeroconf services on your local network, so you can use that to debug the service advertisement for the consumers. 您甚至可以将Safari配置为向您显示本地网络上的zeroconf服务，因此您可以使用它为使用者调试服务广告。 One side advantage of this kind of approach is that you could easily run the producers on different machines than the consumers. 这种方法的一个优势是，您可以轻松地在生产者和消费者以外的机器上运行生产者。

You need a client-server model on a local system. 您需要本地系统上的客户端-服务器模型。 You could do this using TCP/IP sockets to communicate between your clients and servers, but it's faster to use local named pipes if you don't have the need to communicate over a network. 您可以使用TCP / IP套接字在客户端和服务器之间进行通信，但是如果不需要通过网络进行通信，则使用本地命名管道会更快。

The basic requirements for you if I understood correctly are these: 如果我理解正确，对您的基本要求如下：
1. A producer should be able to spawn consumers if none exist already. 1.如果没有生产者，则生产者应该能够产生消费者。
2. A producer should be able to communicate with consumers. 2.生产者应该能够与消费者沟通。
3. A producer should be able to find pre-existing consumers and communicate with them. 3.生产者应该能够找到先前存在的消费者并与他们沟通。
4. Even if a producer completes, consumers should continue running. 4.即使生产者完成了，消费者也应该继续运行。
5. More than one producer should be able to communicate with the consumers. 5.一个以上的生产者应该能够与消费者沟通。

Let's tackle each one of these one by one: 让我们一一解决：

(1) is a simple process-creation problem, except that consumer (child) processes should continue running, even if the producer (parent) exits. （1）是一个简单的流程创建问题，除了消费者（子）流程应继续运行（即使生产者（父）退出）也是如此。 See (4) below. 请参阅下面的（4）。

(2) A producer can communicate with consumers using named pipes . （2）生产者可以使用命名管道与消费者进行通信。 See os.mkfifo() and unix man page of mkfifo() to create named pipes. 请参见os.mkfifo（）和mkfifo（）的 unix 手册页以创建命名管道。

(3) You need to create named pipes from the consumer processes in a well known path, when they start running. （3）当使用者进程开始运行时，您需要按照其已知的路径从使用者进程创建命名管道。 The producer can find out if any consumers are running by looking for this well-known pipe(s) in the same location. 生产者可以通过在同一位置查找该知名管道来确定是否有任何消费者在运行。 If the pipe(s) do not exist, no consumers are running, and the producers can spawn these. 如果管道不存在，则没有使用者在运行，生产者可以生成它们。

(4) You'll need to use os.setuid() for this, and make the consumer processes act like a daemon. （4）为此，您需要使用os.setuid（），并使使用者进程像守护程序一样工作。 See unix man page of setsid(). 请参见setsid（）的Unix 手册页。

(5) This one is tricky. （5）这很棘手。 Multiple producers can communicate with the consumers using the same named pipe, but you cannot transfer more than "PIPE_BUF" amount of data from the producer to the consumer, if you want to reliably identify which producer sent the data, or if you want to prevent some kind of interleaving of data from different producers. 多个生产者可以使用相同的命名管道与使用者进行通信，但是，如果您想可靠地确定哪个生产者发送了数据，或者您想防止从生产者到消费者的数据传输，则不能超过“ PIPE_BUF”数量来自不同生产者的某种数据交织。

A better way to do (5) is to have the consumers open a "control" named pipe (/tmp/control.3456, 3456 being the consumer pid) on execution. 更好的方法（5）是让使用者在执行时打开名为管道的“控件”（/tmp/control.3456，3456为使用者pid）。 Producers first set up a communication channel using the "control" pipe. 生产者首先使用“控制”管道建立通信通道。 When a producer connects, it sends its pid say "1234", to the consumer on the "control" pipe, which tells the consumer to create a named pipe for data exchange with the producer, say "/tmp/data.1234". 当生产者连接时，它将在其“控制”管道上将其pid称“ 1234”发送给消费者，该管道告诉消费者创建一个命名管道以与生产者进行数据交换，例如“ /tmp/data.1234”。 Then the producer closes the "control" pipe, and opens "/tmp/data.1234" to communicate with the consumer. 然后，生产者关闭“控制”管道，并打开“ /tmp/data.1234”与消费者进行通信。 Each consumer can have its own "control" pipes (use the consumer pids to distinguish between pipes of different consumers), and each producer gets its own "data" pipe.. When a producer finishes , it should clean up its data pipe or tell the consumer to do so. 每个消费者都可以拥有自己的“控制”管道（用消费者的PID不同消费者的管道之间的区别），每个生产者都有自己的“数据”的管道。当一个制片人完成，它应该清理的数据管道，或告诉消费者这样做。 Similarly, when the consumer finishes, it should clean up its control pipes. 同样，使用者使用完后，应清理控制管道。

A difficulty here is to prevent multiple producers from connecting to the control pipes of a single consumer at the same time. 这里的困难是防止多个生产者同时连接到单个消费者的控制管道。 The "control" pipe here is a shared resource and you need to synchronize between different producers to access it. 这里的“控制”管道是共享资源，您需要在不同的生产者之间进行同步才能访问它。 Use semaphores for it or file locking . 使用信号量或文件锁定。 See the posix_ipc python module for this. 参见posix_ipc python模块。

Note: I have described most of the above in terms of general UNIX semantics, but all you really need is the ability to create daemon processes, ability to create "named" pipes/queues/whatever so that they can be found by an unrelated process, and ability to synchronize between unrelated processes. 注意：我已经用通用的UNIX语义描述了上面的大多数内容，但是您真正需要的只是创建守护进程的能力，创建“命名”管道/队列/任何东西的能力，以便可以通过不相关的进程找到它们。，以及在不相关的流程之间进行同步的能力。 You can use any python module which provides such semantics. 您可以使用任何提供此类语义的python模块。