簡體   English   中英

如何使用 beautifulSoup 從網頁獲取圖像

[英]How to use beautifulSoup to get the image from a webpage

我正在努力從該網頁獲取圖像,我能夠很好地獲取標題、價格和其他元素,但不是圖像。

<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244- 
         9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">

我目前使用的代碼是:

for ele in array:
            item = [ele.find('h4', {'class': 'title'}).text, #title
                    ele.find('span', {'data-test-selector': 'spanPrice'}).text,
                    ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})['src']]

但這會返回:

TypeError: 'NoneType' object is not subscriptable

有人知道嗎?

您可能想首先檢查是否有image標簽,然后再獲取屬性:

from bs4 import BeautifulSoup

element = """
<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">
       </div>
</div>"""

image = BeautifulSoup(element, "html.parser").find("img", class_="img-responsive b-lazy b-loaded")
if image is not None:
    print(image["src"])

Output:

https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg

編輯:

根據您的評論,試試這個:

item = []
for ele in array:
    title = ele.find('h4', {'class': 'title'}).tex
    price = ele.find('span', {'data-test-selector': 'spanPrice'}).text
    img_src = ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})
    if img_src is not None:
        item.extend([title, price, img_src["src"]])
    else:
        item.append([title, price, "No image source"])

使用這個

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('https://www.scottycameron.com/store/product/3494')
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img')
for img in images:
    if img.has_attr('src'):
        print(img['src'])

Output

/img/icon-header-user.png
/img/icon-header-cart.png
https://www.scottycameron.com/media/18299/puttertarchivenav_jan2021.jpg
https://www.scottycameron.com/media/18302/customizenav_jan2021.jpg
https://www.scottycameron.com/media/18503/showcasenav_2_2021.jpg
https://www.scottycameron.com/media/18454/2021phtmx_new_nws_thmb1.jpg
https://www.scottycameron.com/media/18301/aboutnav_jan2021_b.jpg
https://api.scottycameron.com/Data/Media/Catalog/1/1000/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE PLATE FRAME - SCOTTY CAMERON FINE MILLED PUTTERS.jpg
/store/content/images/loading.svg

將收集站點 url 中的所有圖像,從中我們可以進行進一步的處理。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM