parallelize beautiful soup scraper in python

Question

I would like to parallelize my scraping script, which in written in python using beautiful soup. Despite reading up on it, I am confused on how to get it to work in my code. What I want to do for now is take a list of links as input and open several browsers/tabs to take this urls as input. Later obviously I want to include my entire code and scrape from each of the sides. But I cannot get this first step to work.

Here is my attempt:

Test_links = ['https://www.google.com/maps', 'https://www.google.co.uk/? 
gfe_rd=cr&dcr=0&ei=3vPNWpTWOu7t8weBlbXACA', 'https://scholar.google.de/']

def get_URL(Link):
    browser = webdriver.Chrome(chrome_options = options)
    browser.get(Link)

if __name__ == '__main__':
    pool = Pool(processes=5)
    pool.map(get_URL, Link)

Answer 1

I'm not sure if this will work for you, but I think there's an issue with your naming. Try to stay away from capitalizing variables, because I think they are getting confused with Objects. You could try something like this to see if that theory is right.

test_links = ['https://www.google.com/maps', 'https://www.google.co.uk/? 
gfe_rd=cr&dcr=0&ei=3vPNWpTWOu7t8weBlbXACA', 'https://scholar.google.de/']

def get_URL(test_links_list):
    browser = webdriver.Chrome(chrome_options = options)
    browser.get(test_links_list)

if __name__ == '__main__':
    pool = Pool(processes=5)
    pool.map(get_URL, test_links)

I'm not sure if browser.get() will take a list, you might have to iterate over the list calling browser.

parallelize beautiful soup scraper in python

Question

1 answers

solution1
0 2018-10-19 17:41:24

parallelize beautiful soup scraper in python

Question

1 answers

solution1 0 2018-10-19 17:41:24

solution1
0 2018-10-19 17:41:24