[英]Python pickle error when using multiprocessing inside class method
In the class foo
in foomodule.py
below, I am getting an error in the run_with_multiprocessing
method.在下面的 foomodule.py 中的
foomodule.py
foo
中,我在run_with_multiprocessing
方法中遇到错误。 The method breaks up the number of records in self._data
into chunks and calls somefunc()
using a subset of the data, for example somefunc(data[0:800], 800)
in the first iteration, if limit = 800
.该方法将
self._data
中的记录数分解为块,并使用数据的子集调用somefunc()
,例如somefunc(data[0:800], 800)
在第一次迭代中,如果limit = 800
。
I have done this, because running 10 * 1k records vs. 1 * 10k records shows a great performance improvement in a variation of the run_with_multiprocessing
function that does the same thing, just without multiprocessing.我已经这样做了,因为运行 10 * 1k 记录与 1 * 10k 记录显示在
run_with_multiprocessing
function 的变体中性能有了很大的改进,它做同样的事情,只是没有多处理。 Now I want to use multiprocessing
to see if I can improve performance even more.现在我想使用
multiprocessing
来看看我是否可以进一步提高性能。
I am running python 3.8.2 on Windows 8.1.我在 Windows 8.1 上运行 python 3.8.2。 I am fairly new to python and multiprocessing.
我对 python 和多处理相当陌生。 Thank you so much for your help.
非常感谢你的帮助。
# foomodule.py
import multiprocessing
class foo:
def __init__(self, data, record_count):
self._data = data
self._record_count = record_count
def some_func(self, data, record_count):
# looping through self._data and doing some work
def run_with_multiprocessing(self, limit):
step = 0
while step < self._record_count:
if self._record_count - step < limit:
proc = multiprocessing.Process(target=self.some_func, args=(self._data[step:self._record_count], self._record_count-step))
proc.start()
proc.join()
step = self._record_count
break
proc = multiprocessing.Process(target=self.some_func, args=(self._data[step:self._record_count], self._record_count-step))
proc.start()
proc.join()
step += limit
return
When using the class in script.py
, I get the following error:在
script.py
中使用 class 时,出现以下错误:
import foomodule
# data is a mysql result set with, say, 10'000 rows
start = time.time()
bar = foomodule.foo(data, 10000)
limit = 800
bar.run_with_multiprocessing(limit)
end = time.time()
print("finished after " + str(round(end-start, 2)) + "s")
Traceback (most recent call last):
File "C:/coding/python/project/script.py", line 29, in <module>
bar.run_with_multiprocessing(limit)
File "C:\coding\python\project\foomodule.py", line 303, in run_with_multiprocessing
proc.start()
File "C:\...\Python\Python38-32\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\...\Python\Python38-32\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\...\Python\Python38-32\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\...\Python\Python38-32\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "C:\...\Python\Python38-32\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\...\Python\Python38-32\lib\socket.py", line 272, in __getstate__
raise TypeError(f"cannot pickle {self.__class__.__name__!r} object")
TypeError: cannot pickle 'SSLSocket' object
If a SSLSocket object you add it as argument in multiprocessing.Process()
, the SSLSocket cannot be serialized .如果您将SSLSocket object 作为参数添加到
multiprocessing.Process()
中,则SSLSocket无法序列化。
As you can't serialize a SSLSocket, you do it in the subprocess (function passed as argument in multiprocessing.Process()
)由于您无法序列化 SSLSocket,因此您可以在子进程中执行此操作(在
multiprocessing.Process()
中作为参数传递的函数)
#!/usr/bin/python3
import ssl,multiprocessing
from sources.ClientListener import ClientListener
class SocketServer:
def __init__(self,**kwargs):
self.args = kwargs["args"]
self.__createSocket()
def __handlerClients(self):
try:
while self.sock:
# The sock.accept() allows create a subprocess when there is a connection established
# IMPORTANT: I don't add SSL at socket object because else the SSLSocket object can't pickle when pass it by argument in processing.Process()
conn,addr = self.sock.accept()
eventChildStop = multiprocessing.Event()
subprocess = multiprocessing.Process(target=ClientListener, name="client", args=(conn,addr,eventChildStop))
# This thread is responsible of close the client's child process
threading.Thread(target=ClientListener.exitSubprocess,name="closeChildProcess",args=(eventChildStop,subprocess,)).start()
subprocess.start()
time.sleep(1)
except:
None
def __createSocket(self):
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#this allows address/port to be reused immediately instead before of the TIME_WAIT state
# https://stackoverflow.com/questions/12362542/python-server-only-one-usage-of-each-socket-address-is-normally-permitted
# #sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.sock.bind(("",self.PORT))
self.sock.listen(self.CLIENTS)
print(logFile().message(f"Good days. I am running ClassAdmin server, listenning {self.CLIENTS} clients by port {self.PORT}...",True,"INFO"))
#self.sockSSL = self.context.wrap_socket(sock,server_side=True)
self.__handlerClients()
if __name__=="__main__":
SocketServer(args=sys.argv)
As you can look, in the __handlerClients(self)
method.如您所见,在
__handlerClients(self)
方法中。 I do a while loop of socket object.我做了一个套接字 object 的 while 循环。 For each iteration I know if there is connection established thanks to:
对于每次迭代,我都知道是否建立了连接,这要归功于:
conn,addr = self.sock.accept()
So, I pass the conn
variable in the multiprocessing.Process()
, because conn
is a socket object.所以,我在
multiprocessing.Process()
中传递了conn
变量,因为conn
是一个套接字 object。 The different between conn
and self.sock
is what the conn
has the raddr parameter, and self.sock
hasn't it and the laddr is 0.0.0.0 conn
和self.sock
的区别在于conn
有 raddr 参数,而self.sock
没有, laddr 为 0.0.0.0
self.sock self.sock
<socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 7788)>
conn康恩
<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.0.3', 7788), raddr=('192.168.0.20', 53078)>
multiprocessing多处理
subprocess = multiprocessing.Process(target=ClientListener, name="client", args=(conn,addr,eventChildStop))
Is the same object.同样是object。
Now go at ClientListener现在 go 在 ClientListener
class ClientListener:
def __init__(self,conn,addr,event):
# Get the connection's socket object and I in this connection add secure traffic encrypted with SSL thanks to object SSLSocket of socket module
self.addr = addr
self.conn = self.__SSLTunnel(conn)
self.nick = ""
self.__listenData()
# This creates a ssl tunnel with the ClassAdmin's certificate and private key
def __SSLTunnel(self,sock):
context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
context.load_cert_chain(Environment.SSL("crt"),Environment.SSL("key"))
return context.wrap_socket(sock,server_side=True)
def __listenData(self,sock):
# [...]
As you can look in the __init__(self,conn,addr,event)
I get the conn
variable of previous code.正如您在
__init__(self,conn,addr,event)
中看到的那样,我得到了之前代码的conn
变量。 And in the self.conn
save the same object but passed by SSLSocket并在
self.conn
中保存相同的 object 但通过SSLSocket
self.conn = self.__SSLTunnel(conn)
def __SSLTunnel(self,sock):
context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
context.load_cert_chain(Environment.SSL("crt"),Environment.SSL("key"))
return context.wrap_socket(sock,server_side=True)
The SSLSocket is declared in self.conn
because this can work with send()
and recv()
method. SSLSocket在
self.conn
中声明,因为它可以与send()
和recv()
方法一起使用。
data = self.conn.recv(1024)
self.conn.send("sig.SystemExit(-5000,'The nick exists and is connected',True)".encode("utf-8"))
The self.sock
variable can't allow accept()
method. self.sock
变量不允许使用accept()
方法。
this throw a error:这会引发错误:
[Errno 22] Invalid argument in /etc/ClassAdmin/sources/ClientListener.py:14
What you have a good day.你有什么美好的一天。 I hope I've helped.
我希望我有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.