在python中并行化漂亮的汤刮刀

Question

I would like to parallelize my scraping script, which in written in python using beautiful soup. 我想并行化我的抓取脚本，该脚本使用漂亮的汤用python编写。 Despite reading up on it, I am confused on how to get it to work in my code. 尽管对此有所了解，但我对如何使其在代码中正常工作感到困惑。 What I want to do for now is take a list of links as input and open several browsers/tabs to take this urls as input. 我现在想做的是将链接列表作为输入，并打开几个浏览器/选项卡以将该URL用作输入。 Later obviously I want to include my entire code and scrape from each of the sides. 后来显然我想包括我的整个代码，并从每一面抓取。 But I cannot get this first step to work. 但是我无法迈出第一步。

Here is my attempt: 这是我的尝试：

Test_links = ['https://www.google.com/maps', 'https://www.google.co.uk/? 
gfe_rd=cr&dcr=0&ei=3vPNWpTWOu7t8weBlbXACA', 'https://scholar.google.de/']

def get_URL(Link):
    browser = webdriver.Chrome(chrome_options = options)
    browser.get(Link)

if __name__ == '__main__':
    pool = Pool(processes=5)
    pool.map(get_URL, Link)

Answer 1

I'm not sure if this will work for you, but I think there's an issue with your naming. 我不确定这是否对您有用，但是我认为您的命名存在问题。 Try to stay away from capitalizing variables, because I think they are getting confused with Objects. 尽量不要大写变量，因为我认为它们与对象混淆了。 You could try something like this to see if that theory is right. 您可以尝试类似的方法以查看该理论是否正确。

test_links = ['https://www.google.com/maps', 'https://www.google.co.uk/? 
gfe_rd=cr&dcr=0&ei=3vPNWpTWOu7t8weBlbXACA', 'https://scholar.google.de/']

def get_URL(test_links_list):
    browser = webdriver.Chrome(chrome_options = options)
    browser.get(test_links_list)

if __name__ == '__main__':
    pool = Pool(processes=5)
    pool.map(get_URL, test_links)

I'm not sure if browser.get() will take a list, you might have to iterate over the list calling browser. 我不确定browser.get（）是否会列出列表，您可能必须遍历列表调用浏览器。

在python中并行化漂亮的汤刮刀

问题描述

1 个解决方案

解决方案1
0 2018-10-19 17:41:24

在python中并行化漂亮的汤刮刀

问题描述

1 个解决方案

解决方案1 0 2018-10-19 17:41:24

解决方案1
0 2018-10-19 17:41:24