简体   繁体   English

Python:使用Beautiful Soup从HTML标记提取图像源

[英]Python : extracting image source from HTML tag using Beautiful Soup

I'm trying to print the image Src tag value alone, I could successfully able to print the image tag value, But not able to get the src tag value. 我试图单独打印图像Src标签值,我可以成功打印图像标签值,但无法获取src标签值。

import urllib3
import certifi
from urllib3 import PoolManager
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
manager=PoolManager(num_pools=3,cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())

base_url="https://app.tipotapp.com/docs/quickstart/"
page=manager.request('GET',base_url)
soup = BeautifulSoup(page.data, 'html.parser')
idd='creating-an-application'

for sibling in soup.find(id=idd).next_siblings:
    if sibling.name is None :
       continue
    elif sibling.name != 'h2'  :
       print(sibling.getText())
       if sibling.img is not None:
          print(sibling.img)
          #print(sibling.select_one("img"))
       else:
          continue  
    else :   
        break

The output which I'm getting now is, 我现在得到的输出是

Prints: ....Some expected strings.... Then the below output 打印:...。一些预期的字符串。...然后下面的输出

<img alt="Student Management System" 
src="https://app.tipotapp.com/docs/images/quickstart/image_004.png"/>

In that, I want to print only src value. 在那,我只想打印src值。

To get the value of an attribute, use the __getitem__(self, key) method. 要获取属性的值,请使用__getitem__(self, key)方法。

tag[key] returns the value of the 'key' attribute for the tag, throws an exception if it's not there. tag[key]返回tag[key]的'key'属性的值,如果不存在则抛出异常。

Just replace the line print(sibling.img) with 只需将print(sibling.img)行替换为

print(sibling.img['src'])

Output: 输出:

https://app.tipotapp.com/docs/images/quickstart/image_002.png
https://app.tipotapp.com/docs/images/quickstart/image_002_1.png
https://app.tipotapp.com/docs/images/quickstart/image_004.png

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM