[英]How can I add "http" to the "src" attributes?
I am trying to scrape content from some websites, this is the websites HTML:我正在尝试从一些网站上抓取内容,这是网站 HTML:
<div class="answer-given-body ugc-base">
<p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
</div>`
In the above HTML inside the img
tag in the src
attribute, it does not start with "HTTP" so the images are not showing when I save the HTML file, how can I edit the src
attributes and add "HTTP" before them?在上面的 HTML 中
src
属性的img
标记内,它不以“HTTP”开头,所以当我保存 HTML 文件时,图像没有显示,如何编辑src
属性并在它们之前添加“HTTP”?
To add "https" to the tags src
, you can access the src
attribute using []
and and "https" as follows:要将“https”添加到标签
src
,您可以使用[]
和 “https” 访问src
属性,如下所示:
from bs4 import BeautifulSoup
html = """
<div class="answer-given-body ugc-base">
<p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
# Select all the `img` tags
for tag in soup.select(".answer-given-body.ugc-base img"):
tag["src"] = "https:" + tag["src"]
print(soup.prettify())
Output: Output:
<div class="answer-given-body ugc-base">
<p>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/>
</p>
</div>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.