简体   繁体   中英

Python multiprocessing communication with SocketServer instances

I have a set of processes, let's call them A, B, and C, that need to communicate with one another. A needs to communicate with B and C; B needs to communicate with A and C; and C needs to communicate with A and B. A, B, and C could be located on different machines or on the same machine.

My thought was to communicate via sockets and use "localhost" if they're all on the same machine (eg, A at port 11111, B at port 22222, etc.). This way a non-local process would be treated like a local process. To do that, I thought I would set up a SocketServer instance for each of A, B, and C, and each of those would know the addresses of the other two. Whenever communication needed to be done, for example A to B, then A would open a socket to B and write the data. Then B's constantly-running server would read the data and store it in a list for use later when needed.

The problem I'm running into is that the stored information isn't being shared between the finish_request method (which is handling the listening) and the __call__ method (which is handling the talking). (The server class is callable because I need that for something else. I don't believe that is relevant to the issue.)

My question is will this work as I have imagined? Will multiprocessing , threading , and socketserver play well together all on the same machine? I am not interested in using other mechanisms to communicate between processes (like Queue or Pipe ). I have a working solution with those. I want to know whether this approach is possible, even if less efficient. And, if it is, what am I doing wrong that is preventing it from working?

A minimal example that illustrates the issue is below:

import uuid
import sys
import socket
import time
import threading
import collections
import SocketServer
import multiprocessing

class NetworkMigrator(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    def __init__(self, server_address, client_addresses, max_migrants=1):
        SocketServer.TCPServer.__init__(self, server_address, None)
        self.client_addresses = client_addresses
        self.migrants = collections.deque(maxlen=max_migrants)
        self.allow_reuse_address = True
        t = threading.Thread(target=self.serve_forever)
        t.daemon = True
        t.start()

    def finish_request(self, request, client_address):
        try:
            rbufsize = -1
            wbufsize = 0
            rfile = request.makefile('rb', rbufsize)
            wfile = request.makefile('wb', wbufsize)

            data = rfile.readline().strip()
            self.migrants.append(data)
            print("finish_request::  From: %d  To: %d  MID: %d  Size: %d -- %s" % (client_address[1], 
                                                                                   self.server_address[1], 
                                                                                   id(self.migrants), 
                                                                                   len(self.migrants), 
                                                                                   data))

            if not wfile.closed:
                wfile.flush()
            wfile.close()
            rfile.close()        
        finally:
            sys.exc_traceback = None

    def __call__(self, random, population, args):
        client_address = random.choice(self.client_addresses)
        migrant_index = random.randint(0, len(population) - 1)
        data = population[migrant_index]
        data = uuid.uuid4().hex
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            sock.connect(client_address)
            sock.send(data + '\n')
        finally:
            sock.close()
        print("      __call__::  From: %d  To: %d  MID: %d  Size: %d -- %s" % (self.server_address[1], 
                                                                               client_address[1], 
                                                                               id(self.migrants), 
                                                                               len(self.migrants), 
                                                                               data))
        if len(self.migrants) > 0:
            migrant = self.migrants.popleft()
            population[migrant_index] = migrant
        return population


def run_it(migrator, rand, pop):
    for i in range(10):
        pop = migrator(r, pop, {})
        print("        run_it::  Port: %d  MID: %d  Size: %d" % (migrator.server_address[1], 
                                                                 id(migrator.migrants), 
                                                                 len(migrator.migrants)))
        time.sleep(1)


if __name__ == '__main__':
    import random
    r = random.Random()
    a = ('localhost', 11111)
    b = ('localhost', 22222)
    c = ('localhost', 33333)
    am = NetworkMigrator(a, [b, c], max_migrants=11)
    bm = NetworkMigrator(b, [a, c], max_migrants=22)
    cm = NetworkMigrator(c, [a, b], max_migrants=33)

    fun = [am, bm, cm]
    pop = [["larry", "moe", "curly"], ["red", "green", "blue"], ["small", "medium", "large"]]
    jobs = []
    for f, p in zip(fun, pop):
        pro = multiprocessing.Process(target=run_it, args=(f, r, p))
        jobs.append(pro)
        pro.start()
    for j in jobs:
        j.join()
    am.shutdown()
    bm.shutdown()
    cm.shutdown()

Looking at the output from this example, there will be three types of printing:

        run_it::  Port: 11111  MID: 3071227860  Size: 0
      __call__::  From: 11111  To: 22222  MID: 3071227860  Size: 0 -- e00e0891e0714f99b86e9ad743731a00
finish_request::  From: 60782  To: 22222  MID: 3071227972  Size: 10 -- e00e0891e0714f99b86e9ad743731a00

"MID" is the id if the migrants deque in that instance. "From" and "To" are the ports sending/receiving the transmission. And I'm just setting the data to be a random hex string right now so that I can track individual transmissions.

I don't understand why, even with the same MID, at one point it will say that its size is nonzero, and then at a later time it will say its size is 0. I feel like it has to stem from the fact that the calls are multithreaded. If these lines are used instead of the final 2 for loops, the system works the way I would expect:

for _ in range(10):
    for f, p in zip(fun, pop):
        f(r, p, {})
        time.sleep(1)

So what's happening with the multiprocessing version that breaks it?

When we create 3 new NetworkMigrator objects, 3 new threads are started with each of them listening for new TCP connections. Later on, we start 3 new processes for the run_it function. In total, we have 4 processes, with the first process containing 4 threads (1 main + 3 server). Now, the problem is that the other 3 processes will not have access to the changes made to the objects by the listening server threads. This is because processes do not share memory by default.

So, if you start 3 new threads instead of processes, you will notice the difference:

pro = threading.Thread(target=run_it,args=(f,r,p))

There's another minor problem. This sharing between threads is also not completely safe. Its best to use locks whenever we change the state of the objects. Its best to do something like below in both finish_request and call methods.

lock = Lock()
...
lock.acquire()    
self.migrants.append(data)
lock.release()

If you are unhappy with multithreading and you do want multiprocessing, then you could use proxy objects as explained here: http://docs.python.org/library/multiprocessing.html#proxy-objects

As for the object ID's being the same, that is not unexpected. The new processes are passed on the states of the objects (including the object ID) at that point of time. The new process goes on to retain those object ID's but we are talking about two completely different memory spaces here as they are different processes. So, any changes made by the main process will not be reflected in the created subprocesses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM