[英]How to detect bottom of the page using BeautifulSoup and get to the next page?
我正在嘗試抓取 web 頁面並獲取每篇文章的網址。 代碼如下
import requests
from bs4 import BeautifulSoup
main_url = "https://www.rfa.org/vietnamese/news/programs/story_archive?year=2006&month=1"
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
article_links = soup.find_all("div", {"class": "sectionteaser archive"})
for div in article_links:
links = div.findAll('a')
for a in links:
print(a['href'])
上面的代碼只完成了第一個工作的工作,但是還有更多的頁面到 go 通過。 如何檢測還有多少文章並全部獲取?
您可以在有下一頁分頁時循環。 這可以通過存在帶有 class next
的元素來測試。 每次通過循環,您需要將請求中的偏移量增加 15。
import requests
from bs4 import BeautifulSoup as bs
n = 0
with requests.Session() as s:
while True:
url = f'https://www.rfa.org/vietnamese/news/programs/story_archive?year=2006&month=1&b_start:int={n*15}'
r = s.get(url)
soup = bs(r.text, 'lxml')
print([i.text.strip() for i in soup.select('.sectionteaser a > span')])
if soup.select_one('.next') is None:
break
n+=1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.