[英]Why is my code stuck in an infinite loop when scraping?
I am learning how to do basic web scraping with Python 3, and in this example I was trying to scrape all the author names from the website http://quotes.toscrape.com . 我試圖創建一個代碼,但我不知道網站上的頁面總數。 但是,當我嘗試構建它時,編輯器沒有響應。 代碼有問題,還是應該讓它運行更長時間?
import requests
import bs4
i = 0
authors = set()
while True:
try:
if i == 0:
url = "http://quotes.toscrape.com"
else:
url = "http://quotes.toscrape.com/page/{}/".format(i+1)
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text, 'lxml')
for name in soup.select('.author'):
authors.add(name.text)
i += 1
except:
break
我相信這個問題與該網站如何返回有效響應有關,即使該頁碼中沒有引號(例如嘗試http://quotes.toscrape.com/page/23400/ )。 因此,您很可能永遠不會(或至少需要很長時間才能)遇到任何會導致您的 break 語句的錯誤。 相反,您應該嘗試在遇到諸如“未找到引號”之類的文本時嘗試中斷。 例如::
import requests
import bs4
i = 0
authors = set()
while True:
try:
if i == 0:
url = "http://quotes.toscrape.com"
else:
url = "http://quotes.toscrape.com/page/{}/".format(i+1)
res = requests.get(url)
soup = bs4.BeautifulSoup(res.text, 'lxml')
if "No quotes found!" in str(soup):
break
for name in soup.select('.author'):
authors.add(name.text)
i += 1
except:
break
嘗試:
import requests
import bs4
i = 0
authors = set()
while True:
url = "http://quotes.toscrape.com" if i == 0 else \
f"http://quotes.toscrape.com/page/{i}/"
res = requests.get(url)
if res.text.find('No quotes found!') < 0:
soup = bs4.BeautifulSoup(res.text, 'lxml')
for name in soup.select('.author'):
authors.add(name.text)
i += 1
else:
break
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.