Python：使用Beautiful Soup从HTML标记提取图像源

Question

我试图单独打印图像Src标签值，我可以成功打印图像标签值，但无法获取src标签值。

import urllib3
import certifi
from urllib3 import PoolManager
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
manager=PoolManager(num_pools=3,cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())

base_url="https://app.tipotapp.com/docs/quickstart/"
page=manager.request('GET',base_url)
soup = BeautifulSoup(page.data, 'html.parser')
idd='creating-an-application'

for sibling in soup.find(id=idd).next_siblings:
    if sibling.name is None :
       continue
    elif sibling.name != 'h2'  :
       print(sibling.getText())
       if sibling.img is not None:
          print(sibling.img)
          #print(sibling.select_one("img"))
       else:
          continue  
    else :   
        break

我现在得到的输出是

打印：...。一些预期的字符串。...然后下面的输出

<img alt="Student Management System" 
src="https://app.tipotapp.com/docs/images/quickstart/image_004.png"/>

在那，我只想打印src值。

Answer 1

要获取属性的值，请使用__getitem__(self, key)方法。

tag[key]返回tag[key]的'key'属性的值，如果不存在则抛出异常。

只需将print(sibling.img)行替换为

print(sibling.img['src'])

输出：

https://app.tipotapp.com/docs/images/quickstart/image_002.png
https://app.tipotapp.com/docs/images/quickstart/image_002_1.png
https://app.tipotapp.com/docs/images/quickstart/image_004.png

Python：使用Beautiful Soup从HTML标记提取图像源

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-25 16:36:15

Python：使用Beautiful Soup从HTML标记提取图像源

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-25 16:36:15

解决方案1
1 已采纳 2018-02-25 16:36:15