简体   繁体   English

如何将“http”添加到“src”属性?

[英]How can I add "http" to the "src" attributes?

I am trying to scrape content from some websites, this is the websites HTML:我正在尝试从一些网站上抓取内容,这是网站 HTML:

<div class="answer-given-body ugc-base">
  <p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
  </div>`

In the above HTML inside the img tag in the src attribute, it does not start with "HTTP" so the images are not showing when I save the HTML file, how can I edit the src attributes and add "HTTP" before them?在上面的 HTML 中src属性的img标记内,它不以“HTTP”开头,所以当我保存 HTML 文件时,图像没有显示,如何编辑src属性并在它们之前添加“HTTP”?

To add "https" to the tags src , you can access the src attribute using [] and and "https" as follows:要将“https”添加到标签src ,您可以使用[]和 “https” 访问src属性,如下所示:

from bs4 import BeautifulSoup


html = """
<div class="answer-given-body ugc-base">
  <p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
  </div>
"""

soup = BeautifulSoup(html, "html.parser")

# Select all the `img` tags
for tag in soup.select(".answer-given-body.ugc-base img"):
    tag["src"] = "https:" + tag["src"]

print(soup.prettify())

Output: Output:

<div class="answer-given-body ugc-base">
 <p>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/>
 </p>
</div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM