简体   繁体   English

Python3不能通过多处理来挑选列表上的_thread.RLock对象

[英]Python3 can't pickle _thread.RLock objects on list with multiprocessing

I'm trying to parse the websites that contain car's properties(154 kinds of properties). 我正在尝试解析包含汽车属性的网站(154种属性)。 I have a huge list( name is liste_test ) that consist of 280.000 used car announcement URL. 我有一个巨大的列表( 名称是 liste_test ),包含280.000二手车公告网址。

def araba_cekici(liste_test,headers,engine):
    for link in liste_test:
        try:
            page = requests.get(link, headers=headers)
        .....
        .....

When I start my code like that: 当我开始这样的代码时:

araba_cekici(liste_test,headers,engine)

It works and getting results. 它起作用并获得结果。 But approximately in 1 hour, I could only obtain 1500 URL's properties. 但大约在1小时内,我只能获得1500个URL的属性。 It is very slow, and I must use multiprocessing . 它非常慢,我必须使用多处理

I found a result on here with multiprocessing. 我在这里找到了一个带有多处理的结果。 Then I applied to my code, but unfortunately, it is not working. 然后我申请了我的代码,但不幸的是,它没有用。

import numpy as np
import multiprocessing as multi

def chunks(n, page_list):
    """Splits the list into n chunks"""
    return np.array_split(page_list,n)

cpus = multi.cpu_count()

workers = []   
page_bins = chunks(cpus, liste_test)


for cpu in range(cpus):
    sys.stdout.write("CPU " + str(cpu) + "\n")
    # Process that will send corresponding list of pages 
    # to the function perform_extraction
    worker = multi.Process(name=str(cpu), 
                           target=araba_cekici, 
                           args=(page_bins[cpu],headers,engine))
    worker.start()
    workers.append(worker)

for worker in workers:
    worker.join()

And it gives: 它给出了:

TypeError: can't pickle _thread.RLock objects

I found some kind of responses with respects to this error. 我发现了一些与此错误有关的回复。 But none of them works(at least I can't apply to my code). 但它们都不起作用(至少我不适用于我的代码)。 Also, I tried python multiprocess Pool but unfortunately it stucks on jupyter notebook and seems this code works infinitely. 此外,我尝试了python多进程池,但不幸的是它停留在jupyter笔记本上 ,似乎这个代码无限运行。

Late answer, but since this question turns up when searching on Google: multiprocessing sends the data to the worker processes via a multiprocessing.Queue , which requires all data/objects sent to be picklable . 迟到的回答,但由于这个问题变成了搜索在谷歌时: multiprocessing将数据发送到通过工作进程multiprocessing.Queue ,这就要求所有的数据/发送到成为对象picklable

In your code, you try to pass header and engine , whose implementations you don't show. 在您的代码中,您尝试传递headerengine ,它们的实现没有显示。 (Since header holds the HTTP request header, I suspect that engine is the issue here.) To solve your issue, you either have to make engine picklable, or only instantiate engine within the worker process. (由于header包含HTTP请求标头,我怀疑engine是这里的问题。)要解决您的问题,您必须使engine可选,或者仅在工作进程中实例化engine

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TypeError: can't pickle _thread.RLock objects in pandas with multiprocessing - TypeError: can't pickle _thread.RLock objects in pandas with multiprocessing TypeError:无法在 python 3 中腌制 _thread.RLock 对象 - TypeError: can't pickle _thread.RLock objects in python 3 multiprocessing / psycopg2 TypeError:无法pickle _thread.RLock对象 - multiprocessing / psycopg2 TypeError: can't pickle _thread.RLock objects Tensflow Keras: TypeError: can't pickle _thread.RLock objects when using multiprocessing - Tensflow Keras: TypeError: can't pickle _thread.RLock objects when using multiprocessing 类型错误:无法pickle _thread.RLock 对象 - TypeError: can't pickle _thread.RLock objects 获取 TypeError:无法pickle _thread.RLock 对象 - Getting TypeError: can't pickle _thread.RLock objects 使用Web服务时无法腌制_thread.RLock对象 - can't pickle _thread.RLock objects when using a webservice TypeError: can't pickle _thread.RLock objects (Deep Learning) - TypeError: can't pickle _thread.RLock objects ( Deep Learning) “TypeError: can't pickle _thread.RLock objects”,同时使用 pickle 保存 Facebook Prophet 模型 - "TypeError: can't pickle _thread.RLock objects" while saving Facebook Prophet model using pickle 尝试腌制 ML 模型无法在 google colab 中腌制 _thread.RLock 对象 - trying to pickle ML model can't pickle _thread.RLock objects in google colab
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM