简体   繁体   English

如何使用 BeautifulSoup 从动态 web 页面获取图像标签?

[英]How to get an image tag from a dynamic web page using BeautifulSoup?

Hi i am trying to get images on a webpage using requests and BeautifulSoup.嗨,我正在尝试使用请求和 BeautifulSoup 在网页上获取图像。

import requests
from bs4 import BeautifulSoup as BS

data = requests.get(url, headers=headers).content
soup = BS(data, "html.parser")
for imgtag in soup.find_all("img", class_="slider-img"):
    print(imgtag["src"])

The problem is while I am getting the webpage in data it does not contain the image tags.问题是当我在data中获取网页时,它不包含图像标签。 Yet when i go to the webpage by my web browser the div tag is populated with multiple <img class="slider-img"> tags.然而,当我通过 web 浏览器访问网页时,div 标签填充了多个<img class="slider-img">标签。

I am new to this so I am not getting what is going on with that web page.我是新手,所以我不知道 web 页面发生了什么。 Thanks in advance for help.提前感谢您的帮助。

PS - web page is using Fotorama Slider and src attribute contains CDN links. PS - web 页面使用Fotorama Slider并且 src 属性包含 CDN 链接。 if this matters如果这很重要

The image tags are created dynamically by Javascript.图像标签由 Javascript 动态创建。 You only need uuid to construct the image urls and they are stored within the page:您只需要uuid来构建图像 url 并将它们存储在页面中:

import re
import requests
from ast import literal_eval


url = "https://fotorama.io/"
img_url = "https://ucarecdn.com/{uuid}/-/stretch/off/-/resize/760x/"

html_doc = requests.get(url).text
uuids = re.search(r"uuids: (\[.*?\])", html_doc, flags=re.S).group(1)
uuids = literal_eval(uuids)

for uuid in uuids:
    print(img_url.format(uuid=uuid))

Prints:印刷:

https://ucarecdn.com/05e7ff61-c1d5-4d96-ae79-c381956cca2e/-/stretch/off/-/resize/760x/
https://ucarecdn.com/cd8dfa25-2bc5-4546-995a-f3fd23809e1d/-/stretch/off/-/resize/760x/
https://ucarecdn.com/382a5139-6712-4418-b25e-cc8ba69ab07f/-/stretch/off/-/resize/760x/
https://ucarecdn.com/3ed25902-4a51-4628-a057-1e55fbca7856/-/stretch/off/-/resize/760x/
https://ucarecdn.com/5b0b329d-050e-4143-bc92-7f40cdde46f5/-/stretch/off/-/resize/760x/
https://ucarecdn.com/464f96db-6ae3-4875-ac6a-cbede40c4a51/-/stretch/off/-/resize/760x/
https://ucarecdn.com/4facbe78-b4e8-4b7d-8fb0-d3659f46f1b4/-/stretch/off/-/resize/760x/
https://ucarecdn.com/379c6c28-f726-48a3-b59e-1248e1e30443/-/stretch/off/-/resize/760x/
https://ucarecdn.com/631479df-27a8-4047-ae59-63f9167001f2/-/stretch/off/-/resize/760x/
https://ucarecdn.com/8e1e4402-84f0-4d78-b7d8-c48ec437b5af/-/stretch/off/-/resize/760x/
https://ucarecdn.com/f55e6755-198a-408d-8e82-a50370527aed/-/stretch/off/-/resize/760x/
https://ucarecdn.com/5264c896-cf01-4ad9-9216-114c20a388cc/-/stretch/off/-/resize/760x/
https://ucarecdn.com/c6284eae-9be4-4811-b45b-17a5b6e99ad2/-/stretch/off/-/resize/760x/
https://ucarecdn.com/40ff508f-01e5-4417-bee0-20633efc6147/-/stretch/off/-/resize/760x/
https://ucarecdn.com/eaaee377-f1b5-49d7-a7db-d7a1f86b2805/-/stretch/off/-/resize/760x/
https://ucarecdn.com/584c29c8-b521-48ee-8104-6656d4faac97/-/stretch/off/-/resize/760x/
https://ucarecdn.com/798aa641-01fe-4ed2-886b-bac818c5fdfc/-/stretch/off/-/resize/760x/
https://ucarecdn.com/f82be8f5-d517-4642-8fe1-8987b4e530d0/-/stretch/off/-/resize/760x/
https://ucarecdn.com/23b818d0-07c3-40de-a070-c999c1323ff3/-/stretch/off/-/resize/760x/
https://ucarecdn.com/7ca0e7f6-90eb-4254-82ea-58c77e74f6a0/-/stretch/off/-/resize/760x/
https://ucarecdn.com/42dc8c54-2315-453f-9b40-07e332b8ee39/-/stretch/off/-/resize/760x/
https://ucarecdn.com/8e62227c-5acb-4603-abb9-ac0643b7b478/-/stretch/off/-/resize/760x/
https://ucarecdn.com/80713821-5d54-4819-810a-19991502ca56/-/stretch/off/-/resize/760x/
https://ucarecdn.com/35ce83fa-eac1-4326-83e9-e445450b35ce/-/stretch/off/-/resize/760x/
https://ucarecdn.com/3df9ac37-4e86-49e5-9095-28679ab37718/-/stretch/off/-/resize/760x/
https://ucarecdn.com/9e7211c0-b73b-4b1d-8b47-4b1700f9a80f/-/stretch/off/-/resize/760x/
https://ucarecdn.com/1cc3c44b-e4a9-4e37-96cf-afafeb3eb748/-/stretch/off/-/resize/760x/
https://ucarecdn.com/ab52465c-b3d8-4bf6-986a-a4bf815dfaed/-/stretch/off/-/resize/760x/
https://ucarecdn.com/69e43c1d-9fac-4278-bec5-52291c1b1c2b/-/stretch/off/-/resize/760x/
https://ucarecdn.com/0627c11f-522d-48b9-9f17-9ea05b769aaa/-/stretch/off/-/resize/760x/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM