簡體   English   中英

如何使用 BeautifulSoup 從動態 web 頁面獲取圖像標簽?

[英]How to get an image tag from a dynamic web page using BeautifulSoup?

嗨,我正在嘗試使用請求和 BeautifulSoup 在網頁上獲取圖像。

import requests
from bs4 import BeautifulSoup as BS

data = requests.get(url, headers=headers).content
soup = BS(data, "html.parser")
for imgtag in soup.find_all("img", class_="slider-img"):
    print(imgtag["src"])

問題是當我在data中獲取網頁時,它不包含圖像標簽。 然而,當我通過 web 瀏覽器訪問網頁時,div 標簽填充了多個<img class="slider-img">標簽。

我是新手,所以我不知道 web 頁面發生了什么。 提前感謝您的幫助。

PS - web 頁面使用Fotorama Slider並且 src 屬性包含 CDN 鏈接。 如果這很重要

圖像標簽由 Javascript 動態創建。 您只需要uuid來構建圖像 url 並將它們存儲在頁面中:

import re
import requests
from ast import literal_eval


url = "https://fotorama.io/"
img_url = "https://ucarecdn.com/{uuid}/-/stretch/off/-/resize/760x/"

html_doc = requests.get(url).text
uuids = re.search(r"uuids: (\[.*?\])", html_doc, flags=re.S).group(1)
uuids = literal_eval(uuids)

for uuid in uuids:
    print(img_url.format(uuid=uuid))

印刷:

https://ucarecdn.com/05e7ff61-c1d5-4d96-ae79-c381956cca2e/-/stretch/off/-/resize/760x/
https://ucarecdn.com/cd8dfa25-2bc5-4546-995a-f3fd23809e1d/-/stretch/off/-/resize/760x/
https://ucarecdn.com/382a5139-6712-4418-b25e-cc8ba69ab07f/-/stretch/off/-/resize/760x/
https://ucarecdn.com/3ed25902-4a51-4628-a057-1e55fbca7856/-/stretch/off/-/resize/760x/
https://ucarecdn.com/5b0b329d-050e-4143-bc92-7f40cdde46f5/-/stretch/off/-/resize/760x/
https://ucarecdn.com/464f96db-6ae3-4875-ac6a-cbede40c4a51/-/stretch/off/-/resize/760x/
https://ucarecdn.com/4facbe78-b4e8-4b7d-8fb0-d3659f46f1b4/-/stretch/off/-/resize/760x/
https://ucarecdn.com/379c6c28-f726-48a3-b59e-1248e1e30443/-/stretch/off/-/resize/760x/
https://ucarecdn.com/631479df-27a8-4047-ae59-63f9167001f2/-/stretch/off/-/resize/760x/
https://ucarecdn.com/8e1e4402-84f0-4d78-b7d8-c48ec437b5af/-/stretch/off/-/resize/760x/
https://ucarecdn.com/f55e6755-198a-408d-8e82-a50370527aed/-/stretch/off/-/resize/760x/
https://ucarecdn.com/5264c896-cf01-4ad9-9216-114c20a388cc/-/stretch/off/-/resize/760x/
https://ucarecdn.com/c6284eae-9be4-4811-b45b-17a5b6e99ad2/-/stretch/off/-/resize/760x/
https://ucarecdn.com/40ff508f-01e5-4417-bee0-20633efc6147/-/stretch/off/-/resize/760x/
https://ucarecdn.com/eaaee377-f1b5-49d7-a7db-d7a1f86b2805/-/stretch/off/-/resize/760x/
https://ucarecdn.com/584c29c8-b521-48ee-8104-6656d4faac97/-/stretch/off/-/resize/760x/
https://ucarecdn.com/798aa641-01fe-4ed2-886b-bac818c5fdfc/-/stretch/off/-/resize/760x/
https://ucarecdn.com/f82be8f5-d517-4642-8fe1-8987b4e530d0/-/stretch/off/-/resize/760x/
https://ucarecdn.com/23b818d0-07c3-40de-a070-c999c1323ff3/-/stretch/off/-/resize/760x/
https://ucarecdn.com/7ca0e7f6-90eb-4254-82ea-58c77e74f6a0/-/stretch/off/-/resize/760x/
https://ucarecdn.com/42dc8c54-2315-453f-9b40-07e332b8ee39/-/stretch/off/-/resize/760x/
https://ucarecdn.com/8e62227c-5acb-4603-abb9-ac0643b7b478/-/stretch/off/-/resize/760x/
https://ucarecdn.com/80713821-5d54-4819-810a-19991502ca56/-/stretch/off/-/resize/760x/
https://ucarecdn.com/35ce83fa-eac1-4326-83e9-e445450b35ce/-/stretch/off/-/resize/760x/
https://ucarecdn.com/3df9ac37-4e86-49e5-9095-28679ab37718/-/stretch/off/-/resize/760x/
https://ucarecdn.com/9e7211c0-b73b-4b1d-8b47-4b1700f9a80f/-/stretch/off/-/resize/760x/
https://ucarecdn.com/1cc3c44b-e4a9-4e37-96cf-afafeb3eb748/-/stretch/off/-/resize/760x/
https://ucarecdn.com/ab52465c-b3d8-4bf6-986a-a4bf815dfaed/-/stretch/off/-/resize/760x/
https://ucarecdn.com/69e43c1d-9fac-4278-bec5-52291c1b1c2b/-/stretch/off/-/resize/760x/
https://ucarecdn.com/0627c11f-522d-48b9-9f17-9ea05b769aaa/-/stretch/off/-/resize/760x/

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM