简体   繁体   English

Python多处理 - TypeError:出于安全原因,不允许选择AuthenticationString对象

[英]Python Multiprocessing - TypeError: Pickling an AuthenticationString object is disallowed for security reasons

I'm having the following Problem. 我遇到了以下问题。 I want to implement a web crawler, so far this worked but it was so slow, that I tried to use multiprocessing for fetching the URLs. 我想实现一个网络爬虫,到目前为止这有效,但它很慢,我尝试使用多处理来获取URL。 Unfortunately I'm not very experienced at this field. 不幸的是,我在这个领域并不是很有经验。 After some reading the easiest way seemed to me to use the map method from multiprocessing.pool for this. 经过一些阅读后,我觉得最简单的方法是使用multiprocessing.poolmap方法。

But I constantly get the following error: 但我不断收到以下错误:

TypeError: Pickling an AuthenticationString object is disallowed for security reasons

I found very few cases with the same error and they unfortunately did not help me. 我发现很少有同样错误的案例,但遗憾的是他们没有帮助我。

I created a stripped version of my code which can reproduce the error: 我创建了一个我的代码的剥离版本,可以重现错误:

import multiprocessing

class TestCrawler:
    def __init__(self):
        self.m = multiprocessing.Manager()
        self.queue = self.m.Queue()
        for i in range(50):
            self.queue.put(str(i))
        self.pool = multiprocessing.Pool(6)



    def mainloop(self):
        self.process_next_url(self.queue)

        while True:
            self.pool.map(self.process_next_url, (self.queue,))                

    def process_next_url(self, queue):
        url = queue.get()
        print(url)


c = TestCrawler()
c.mainloop()

I would be very thankful about any help or suggestion! 我会非常感谢任何帮助或建议!

Question : But I constantly get the following error: 问题 :但我不断收到以下错误:

The Error you'r getting is missleading, the reason are 你得到的错误是误导,原因是

self.queue = self.m.Queue()

Move the Queue instantiation Outside the class TestCrawler . class TestCrawler之外移动Queue实例。
This leads to another Error: 这导致另一个错误:

NotImplementedError: pool objects cannot be passed between processes or pickled NotImplementedError:池对象不能在进程之间传递或被pickle

The reason are: 原因是:

self.pool = multiprocessing.Pool(6)

Both Errors are indicating that pickle can't find the class Members . 两个错误都表明pickle找不到class Members

Note : Endless Loop! 注意 :无尽的循环!
Your following while Loop leads to a Endless Loop! 你的下面while循环导致死循环! This will overload your System! 这将使您的系统过载
Furthermore, your pool.map(... starts only one Process with one Task! 此外,你的pool.map(...只用一个任务启动一个 Process

  while True: self.pool.map(self.process_next_url, (self.queue,)) 

I suggest reading The Examples that demonstrates the use of a pool 我建议阅读演示池使用的示例


Change to the following: 更改为以下内容:

class TestCrawler:
    def __init__(self, tasks):
        # Assign the Global task to class member
        self.queue = tasks
        for i in range(50):
            self.queue.put(str(i))

    def mainloop(self):
        # Instantiate the pool local
        pool = mp.Pool(6)
        for n in range(50):
            # .map requires a Parameter pass None
            pool.map(self.process_next_url, (None,))

    # None is passed
    def process_next_url(self, dummy):
        url = self.queue.get()
        print(url)

if __name__ == "__main__":
  # Create the Queue as Global
  tasks = mp.Manager().Queue()
  # Pass the Queue to your class TestCrawler
  c = TestCrawler(tasks)
  c.mainloop()

This Example starts 5 Processes each processing 10 Tasks(urls): 此示例启动5个进程,每个进程处理10个任务(URL):

class TestCrawler2:
    def __init__(self, tasks):
        self.tasks = tasks

    def start(self):
        pool = mp.Pool(5)
        pool.map(self.process_url, self.tasks)

    def process_url(self, url):
        print('self.process_url({})'.format(url))

if __name__ == "__main__":
    tasks = ['url{}'.format(n) for n in range(50)]
    TestCrawler2(tasks).start()

Tested with Python: 3.4.2 用Python测试:3.4.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM