简体   繁体   English

在 class 方法中使用多处理时 Python 泡菜错误

[英]Python pickle error when using multiprocessing inside class method

In the class foo in foomodule.py below, I am getting an error in the run_with_multiprocessing method.在下面的 foomodule.py 中的foomodule.py foo中,我在run_with_multiprocessing方法中遇到错误。 The method breaks up the number of records in self._data into chunks and calls somefunc() using a subset of the data, for example somefunc(data[0:800], 800) in the first iteration, if limit = 800 .该方法将self._data中的记录数分解为块,并使用数据的子集调用somefunc() ,例如somefunc(data[0:800], 800)在第一次迭代中,如果limit = 800

I have done this, because running 10 * 1k records vs. 1 * 10k records shows a great performance improvement in a variation of the run_with_multiprocessing function that does the same thing, just without multiprocessing.我已经这样做了,因为运行 10 * 1k 记录与 1 * 10k 记录显示在run_with_multiprocessing function 的变体中性能有了很大的改进,它做同样的事情,只是没有多处理。 Now I want to use multiprocessing to see if I can improve performance even more.现在我想使用multiprocessing来看看我是否可以进一步提高性能。

I am running python 3.8.2 on Windows 8.1.我在 Windows 8.1 上运行 python 3.8.2。 I am fairly new to python and multiprocessing.我对 python 和多处理相当陌生。 Thank you so much for your help.非常感谢你的帮助。

# foomodule.py
import multiprocessing

class foo:
    def __init__(self, data, record_count):
        self._data = data
        self._record_count = record_count

    def some_func(self, data, record_count):
        # looping through self._data and doing some work    


    def run_with_multiprocessing(self, limit):
        step = 0
        while step < self._record_count:
            if self._record_count - step < limit:
                proc = multiprocessing.Process(target=self.some_func, args=(self._data[step:self._record_count], self._record_count-step))
                proc.start()
                proc.join()
                step = self._record_count
                break

            proc = multiprocessing.Process(target=self.some_func, args=(self._data[step:self._record_count], self._record_count-step))
            proc.start()
            proc.join()
            step += limit
        return

When using the class in script.py , I get the following error:script.py中使用 class 时,出现以下错误:

import foomodule

# data is a mysql result set with, say, 10'000 rows
start = time.time()
bar = foomodule.foo(data, 10000)
limit = 800
bar.run_with_multiprocessing(limit)
end = time.time()
print("finished after " + str(round(end-start, 2)) + "s")

Traceback (most recent call last):
  File "C:/coding/python/project/script.py", line 29, in <module>
    bar.run_with_multiprocessing(limit)
  File "C:\coding\python\project\foomodule.py", line 303, in run_with_multiprocessing
    proc.start()
  File "C:\...\Python\Python38-32\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\...\Python\Python38-32\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\...\Python\Python38-32\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "C:\...\Python\Python38-32\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\...\Python\Python38-32\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\...\Python\Python38-32\lib\socket.py", line 272, in __getstate__
    raise TypeError(f"cannot pickle {self.__class__.__name__!r} object")
TypeError: cannot pickle 'SSLSocket' object

You divide and you will win你分裂,你会赢

Problem问题

If a SSLSocket object you add it as argument in multiprocessing.Process() , the SSLSocket cannot be serialized .如果您将SSLSocket object 作为参数添加到multiprocessing.Process()中,则SSLSocket无法序列化

Solution解决方案

As you can't serialize a SSLSocket, you do it in the subprocess (function passed as argument in multiprocessing.Process() )由于您无法序列化 SSLSocket,因此您可以在子进程中执行此操作(在multiprocessing.Process()中作为参数传递的函数)

Server服务器

#!/usr/bin/python3
import ssl,multiprocessing
from sources.ClientListener import ClientListener

class SocketServer:
    def __init__(self,**kwargs):
        self.args = kwargs["args"]
        self.__createSocket()

    def __handlerClients(self):
        try:
            while self.sock:
                # The sock.accept() allows create a subprocess when there is a connection established
                # IMPORTANT: I don't add SSL at socket object because else the SSLSocket object can't pickle when pass it by argument in processing.Process()
                conn,addr = self.sock.accept()
                eventChildStop = multiprocessing.Event()
                subprocess = multiprocessing.Process(target=ClientListener, name="client", args=(conn,addr,eventChildStop))
                # This thread is responsible of close the client's child process
                threading.Thread(target=ClientListener.exitSubprocess,name="closeChildProcess",args=(eventChildStop,subprocess,)).start()
                subprocess.start()
                time.sleep(1)
        except:
            None

    def __createSocket(self):
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        #this allows address/port to be reused immediately instead before of the TIME_WAIT state
        # https://stackoverflow.com/questions/12362542/python-server-only-one-usage-of-each-socket-address-is-normally-permitted
        # #sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.sock.bind(("",self.PORT))
        self.sock.listen(self.CLIENTS)
        print(logFile().message(f"Good days. I am running ClassAdmin server, listenning {self.CLIENTS} clients by port {self.PORT}...",True,"INFO"))
        #self.sockSSL = self.context.wrap_socket(sock,server_side=True)
        self.__handlerClients()

if __name__=="__main__":
    SocketServer(args=sys.argv)

As you can look, in the __handlerClients(self) method.如您所见,在__handlerClients(self)方法中。 I do a while loop of socket object.我做了一个套接字 object 的 while 循环。 For each iteration I know if there is connection established thanks to:对于每次迭代,我都知道是否建立了连接,这要归功于:

conn,addr = self.sock.accept()

So, I pass the conn variable in the multiprocessing.Process() , because conn is a socket object.所以,我在multiprocessing.Process()中传递了conn变量,因为conn是一个套接字 object。 The different between conn and self.sock is what the conn has the raddr parameter, and self.sock hasn't it and the laddr is 0.0.0.0 connself.sock的区别在于conn有 raddr 参数,而self.sock没有, laddr 为 0.0.0.0

self.sock self.sock

<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 7788)>

conn康恩

<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.0.3', 7788), raddr=('192.168.0.20', 53078)>

multiprocessing多处理

subprocess = multiprocessing.Process(target=ClientListener, name="client", args=(conn,addr,eventChildStop))

Is the same object.同样是object。

Now go at ClientListener现在 go 在 ClientListener

ClientListener客户端监听器

class ClientListener:
    def __init__(self,conn,addr,event):
         # Get the connection's socket object and I in this connection add secure traffic encrypted with SSL thanks to object SSLSocket of socket module
         self.addr = addr
         self.conn = self.__SSLTunnel(conn)
         self.nick = ""
         self.__listenData()

    # This creates a ssl tunnel with the ClassAdmin's certificate and private key
    def __SSLTunnel(self,sock):
        context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
        context.load_cert_chain(Environment.SSL("crt"),Environment.SSL("key"))
        return context.wrap_socket(sock,server_side=True)

    def __listenData(self,sock):
       # [...]

As you can look in the __init__(self,conn,addr,event) I get the conn variable of previous code.正如您在__init__(self,conn,addr,event)中看到的那样,我得到了之前代码的conn变量。 And in the self.conn save the same object but passed by SSLSocket并在self.conn中保存相同的 object 但通过SSLSocket

self.conn = self.__SSLTunnel(conn)
    def __SSLTunnel(self,sock):
        context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
        context.load_cert_chain(Environment.SSL("crt"),Environment.SSL("key"))
        return context.wrap_socket(sock,server_side=True)

Important重要的

The SSLSocket is declared in self.conn because this can work with send() and recv() method. SSLSocketself.conn中声明,因为它可以与send()recv()方法一起使用。

data = self.conn.recv(1024)

self.conn.send("sig.SystemExit(-5000,'The nick exists and is connected',True)".encode("utf-8"))

The self.sock variable can't allow accept() method. self.sock变量不允许使用accept()方法。

this throw a error:这会引发错误:

[Errno 22] Invalid argument in /etc/ClassAdmin/sources/ClientListener.py:14

What you have a good day.你有什么美好的一天。 I hope I've helped.我希望我有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM