簡體   English   中英

如何添加循環以抓取網站的下一頁

[英]How to add a loop to scrape the next page of a website

我下面的代碼有效,但我希望它做同樣的事情,但是對於 URL 變量的下一頁,這將通過根據頁面添加數字 1、2、3 來完成。

該代碼實質上是抓取具有各種視頻縮略圖的網站,然后返回每個視頻的鏈接。 我希望它為每個可用頁面執行此操作

from bs4 import BeautifulSoup
import requests
import re
import urllib.request
from urllib.request import Request, urlopen




URL = "domain.com/"

page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

endof = soup.find_all('div',class_="th-image")
links = [a['href'] for a in soup.find_all('a', href=True)]
endoflinks = links[8:-8]

index = 0
for a in endoflinks:
    
    index+=1
    
    dwnlink = "domain.com"+ endoflinks[index]

    
    r = requests.get(dwnlink)
    f = open("output.txt", "a")
    print(r.url, file=f)
    f.close()

這應該可以幫助您開始:

URL = "domain.com/"

for i in list(range(0,10)):
    print("domain.com/"+str(i))
    r = requests.get(URL+str(i))
    f = open("output.txt", "a")
    print(r.url, file=f)
    f.close()
domain.com/0
domain.com/1
domain.com/2
domain.com/3
domain.com/4
domain.com/5
domain.com/6
domain.com/7
domain.com/8
domain.com/9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM