使用Selenium和Multiprocessing进行Python Web抓取

Question

I have the following function.It takes a link as an argument, and gives me the data I need at the end (base_list) 我有以下功能。它以链接作为参数，并在最后给出我需要的数据（base_list）

def extract_BS(link):

    base_list = []

    try:

        browser = webdriver.Firefox()
        browser.get(link)                   
        print('Processing..' + link)

        respData = browser.page_source
        soup = bs.BeautifulSoup(respData, 'html.parser')
        browser.quit()
        shipping_options = soup.find("div", class_ = "modal_shipping_con").find_all("input", class_="inputChangePrice")
        base_list.append(shipping_options)

    except Exception as e:
        print(urls)
        print('Exception:',e)
        pass

    return base_list

However, my list of urls is around 1000, so it takes too long to do them individually. 但是，我的网址列表大约为1000，因此单独进行处理会花费太长时间。 I found this blog on multiprocessing, which I am trying to tweak 我发现了这个有关多处理的博客，我正在尝试对其进行调整

My additional code to use the multithreading is as follows 我使用多线程的其他代码如下

with Pool(4) as p:
    records = p.map(extract_BS,urls)

p.terminate()
p.join()

However, when I run it, I get the following error. 但是，当我运行它时，出现以下错误。

MaybeEncodingError: Error sending result:
Reason: 'RecursionError('maximum recursion depth exceeded in       comparison',)'

When I run the script, 4 windows pop open, and the webscraper does its thing, but it seems like the problem lies with 'records = p.map(extract_BS,urls)'. 当我运行脚本时，会弹出4个窗口，而网络爬虫会执行此操作，但问题似乎出在“ records = p.map（extract_BS，urls）”。 My function outputs a list, which I want to store in 'results', but for some reason this is not working. 我的函数输出一个列表，我想将其存储在“结果”中，但是由于某种原因，它不起作用。

Any advice/help/observations much appreciated, thanks 任何建议/帮助/观察非常感谢，谢谢

Answer 1

You need to place 您需要放置

if __name__ == "__main__":

before with Pool(4) as p to prevent to create a pool in each pool which causes the recursion error. 之前使用Pool（4）作为p来防止在每个池中创建一个导致递归错误的池。

使用Selenium和Multiprocessing进行Python Web抓取

问题描述

1 个解决方案

解决方案1
0 2018-02-18 12:31:47

使用Selenium和Multiprocessing进行Python Web抓取

问题描述

1 个解决方案

解决方案1 0 2018-02-18 12:31:47

解决方案1
0 2018-02-18 12:31:47