[英]Python3 can't pickle _thread.RLock objects on list with multiprocessing
I'm trying to parse the websites that contain car's properties(154 kinds of properties). 我正在尝试解析包含汽车属性的网站(154种属性)。 I have a huge list( name is liste_test ) that consist of 280.000 used car announcement URL.
我有一个巨大的列表( 名称是 liste_test ),包含280.000二手车公告网址。
def araba_cekici(liste_test,headers,engine):
for link in liste_test:
try:
page = requests.get(link, headers=headers)
.....
.....
When I start my code like that: 当我开始这样的代码时:
araba_cekici(liste_test,headers,engine)
It works and getting results. 它起作用并获得结果。 But approximately in 1 hour, I could only obtain 1500 URL's properties.
但大约在1小时内,我只能获得1500个URL的属性。 It is very slow, and I must use multiprocessing .
它非常慢,我必须使用多处理 。
I found a result on here with multiprocessing. 我在这里找到了一个带有多处理的结果。 Then I applied to my code, but unfortunately, it is not working.
然后我申请了我的代码,但不幸的是,它没有用。
import numpy as np
import multiprocessing as multi
def chunks(n, page_list):
"""Splits the list into n chunks"""
return np.array_split(page_list,n)
cpus = multi.cpu_count()
workers = []
page_bins = chunks(cpus, liste_test)
for cpu in range(cpus):
sys.stdout.write("CPU " + str(cpu) + "\n")
# Process that will send corresponding list of pages
# to the function perform_extraction
worker = multi.Process(name=str(cpu),
target=araba_cekici,
args=(page_bins[cpu],headers,engine))
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
And it gives: 它给出了:
TypeError: can't pickle _thread.RLock objects
I found some kind of responses with respects to this error. 我发现了一些与此错误有关的回复。 But none of them works(at least I can't apply to my code).
但它们都不起作用(至少我不适用于我的代码)。 Also, I tried python multiprocess Pool but unfortunately it stucks on jupyter notebook and seems this code works infinitely.
此外,我尝试了python多进程池,但不幸的是它停留在jupyter笔记本上 ,似乎这个代码无限运行。
Late answer, but since this question turns up when searching on Google: multiprocessing
sends the data to the worker processes via a multiprocessing.Queue
, which requires all data/objects sent to be picklable . 迟到的回答,但由于这个问题变成了搜索在谷歌时:
multiprocessing
将数据发送到通过工作进程multiprocessing.Queue
,这就要求所有的数据/发送到成为对象picklable 。
In your code, you try to pass header
and engine
, whose implementations you don't show. 在您的代码中,您尝试传递
header
和engine
,它们的实现没有显示。 (Since header
holds the HTTP request header, I suspect that engine
is the issue here.) To solve your issue, you either have to make engine
picklable, or only instantiate engine
within the worker process. (由于
header
包含HTTP请求标头,我怀疑engine
是这里的问题。)要解决您的问题,您必须使engine
可选,或者仅在工作进程中实例化engine
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.