![](/img/trans.png)
[英]How can I pass context data to sylesheet 'src' and image 'src' attributes?
[英]How can I add "http" to the "src" attributes?
我正在尝试从一些网站上抓取内容,这是网站 HTML:
<div class="answer-given-body ugc-base">
<p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
</div>`
在上面的 HTML 中src
属性的img
标记内,它不以“HTTP”开头,所以当我保存 HTML 文件时,图像没有显示,如何编辑src
属性并在它们之前添加“HTTP”?
要将“https”添加到标签src
,您可以使用[]
和 “https” 访问src
属性,如下所示:
from bs4 import BeautifulSoup
html = """
<div class="answer-given-body ugc-base">
<p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
# Select all the `img` tags
for tag in soup.select(".answer-given-body.ugc-base img"):
tag["src"] = "https:" + tag["src"]
print(soup.prettify())
Output:
<div class="answer-given-body ugc-base">
<p>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/>
<img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/>
</p>
</div>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.