Web scraping large amount of links?

Question

I am very new to Web scraping. I have started using BeautifulSoup in Python. I wrote a code that would loop through a list of urls and get me the data i need. The code works fine for 10-12 links but I am not sure if the same code will be effective if the list has over 100 links. Is there any alternative way or any other library to get the data by inputing a list of large number of url's without harming the website in any way. Here is my code so far.

url_list = [url1, url2,url3, url4,url5]
mylist = []
for l in url_list:
    url = l 
    res = get(url)
    soup = BeautifulSoup(res.text, 'html.parser')
    data = soup.find('pre').text
    mylist.append(data)

Answer 1

Here's an example, maybe for you.

from simplified_scrapy import Spider, SimplifiedDoc, SimplifiedMain, utils

class MySpider(Spider):
    name = 'my_spider'
    start_urls = ['url1']
    # refresh_urls = True # If you want to download the downloaded link again, please remove the "#" in the front
    def __init__(self):
        # If your link is stored elsewhere, read it out here.
        self.start_urls = utils.getFileLines('you url file name.txt')
        Spider.__init__(self,self.name) # Necessary

    def extract(self, url, html, models, modelNames):
        doc = SimplifiedDoc(html)
        data = doc.select('pre>text()') # Extract the data you want.
        return {'Urls': None, 'Data':{'data':data} } # Return the data to the framework, which will save it for you.

SimplifiedMain.startThread(MySpider())  # Start download

You can see more examples here, as well as the source code of Library simplified_scrapy: https://github.com/yiyedata/simplified-scrapy-demo

Web scraping large amount of links?

Question

1 answers

solution1
1 ACCPTED 2020-07-28 08:11:32

Web scraping large amount of links?

Question

1 answers

solution1 1 ACCPTED 2020-07-28 08:11:32

solution1
1 ACCPTED 2020-07-28 08:11:32