简体   繁体   中英

BeautifulSoup - How to get all the values of a certain attribute

<div class="cont">

    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>
    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>
    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>

</div>

I try to get all the src values from this HTML.

My code is:

soup = BeautifulSoup(source, "html.parser")
div = soup.find("div", {"class": "cont"})
imgs = div.find_all("img", {"src":True})
print(imgs)

The list returned from this code contains tag and other attributes such as "alt". How can I extract only the values of the src attributes (eg, '/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg')

Try adding for loop, Example

for img in imgs:
    print(img['src'])

Or to make it more simple

from bs4 import BeautifulSoup

html = """
<div class="cont">
    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>
    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>
    <p style="text-align: center; "><img alt="" src="/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg"></p>

</div>
"""

soup = BeautifulSoup(html, features='html.parser')
elements = soup.select('div.cont > p > img')

for element in elements:
    print(element['src'])

Prints out

/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg
/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg
/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg

if you are trying to download images, see example

https://stackoverflow.com/a/61531668/4539709

using find_all

from bs4 import BeautifulSoup

soup = BeautifulSoup(source, "html.parser")
div = soup.find("div", {"class": "cont"})

print([img['src'] for img in div.find_all("img")])

output,

['/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg', 
 '/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg',
 '/web/upload/NNEditor/20200409/1_1_shop1_143320.jpg']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM