![](/img/trans.png)
[英]How to go on last page from number of pages using selenium python?
[英]Get last page number from html using python
我正在嘗試從 html 代碼中獲取最后一個頁碼:
<a aria-label="Page 47" class="pg _act">47</a>
url = "https://www.jumia.com.ng/womens-dresses/"
def getLastPageNumber(soup):
number = []
for item in soup.find_all("a", class_="pg _act"):
x = item.text
number.append(x)
return max(number)
getLastPageNumber(soup)
每當我運行此代碼時,它只會返回“1”,如果我將 url 更改為 url = "https://www.jumia.com.ng/womens-dresses/?page=48" ,它會輸出 48。我是什么想要的是它附加頁碼並返回最大值。
您可以通過aria-label="Last Page"
獲取此元素並通過正則表達式獲取最后一頁的編號,代碼如下:
from bs4 import BeautifulSoup
import requests, re
url = "https://www.jumia.com.ng/womens-dresses/"
regex = r"page=(.*)#"
resp = requests.get(url)
soup = BeautifulSoup(resp.text)
target_tag = soup.find("a", {"aria-label": "Last Page"})
print(re.search(regex, target_tag.get("href")).group(1))
這給了我:
50
它與頁面相同:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.