簡體   English   中英

為什么我的代碼在抓取時陷入無限循環?

[英]Why is my code stuck in an infinite loop when scraping?

I am learning how to do basic web scraping with Python 3, and in this example I was trying to scrape all the author names from the website http://quotes.toscrape.com . 我試圖創建一個代碼,但我不知道網站上的頁面總數。 但是,當我嘗試構建它時,編輯器沒有響應。 代碼有問題,還是應該讓它運行更長時間?

import requests
import bs4
i = 0
authors = set()
while True:
    try:
        if i == 0:
            url = "http://quotes.toscrape.com"
        else: 
            url = "http://quotes.toscrape.com/page/{}/".format(i+1)
        
        res = requests.get(url)
        soup = bs4.BeautifulSoup(res.text, 'lxml')
        
        for name in soup.select('.author'):
            authors.add(name.text)
            
        
        i += 1
        
    except:
        break

我相信這個問題與該網站如何返回有效響應有關,即使該頁碼中沒有引號(例如嘗試http://quotes.toscrape.com/page/23400/ )。 因此,您很可能永遠不會(或至少需要很長時間才能)遇到任何會導致您的 break 語句的錯誤。 相反,您應該嘗試在遇到諸如“未找到引號”之類的文本時嘗試中斷。 例如::

import requests
import bs4
i = 0
authors = set()
while True:
    try:
        if i == 0:
            url = "http://quotes.toscrape.com"
        else: 
            url = "http://quotes.toscrape.com/page/{}/".format(i+1)
    
        res = requests.get(url)
        soup = bs4.BeautifulSoup(res.text, 'lxml')

        if "No quotes found!" in str(soup):
            break
    
        for name in soup.select('.author'):
            authors.add(name.text)
        
    
        i += 1
    
    except:
        break

嘗試:

import requests
import bs4

i = 0
authors = set()

while True:

    url = "http://quotes.toscrape.com" if i == 0 else \
         f"http://quotes.toscrape.com/page/{i}/"

    res = requests.get(url)

    if res.text.find('No quotes found!') < 0:
        soup = bs4.BeautifulSoup(res.text, 'lxml')
        for name in soup.select('.author'):
            authors.add(name.text)
        i += 1
    else:
        break

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM