如何使用 beautifulSoup 從網頁獲取圖像

Question

我正在努力從該網頁獲取圖像，我能夠很好地獲取標題、價格和其他元素，但不是圖像。

<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244- 
         9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">

我目前使用的代碼是：

for ele in array:
            item = [ele.find('h4', {'class': 'title'}).text, #title
                    ele.find('span', {'data-test-selector': 'spanPrice'}).text,
                    ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})['src']]

但這會返回：

TypeError: 'NoneType' object is not subscriptable

有人知道嗎？

Answer 1

您可能想首先檢查是否有image標簽，然后再獲取屬性：

from bs4 import BeautifulSoup

element = """
<div class="product-img">
   <a data-test-selector="linkProductURL" href="https://www.scottycameron.com/store/product/3494">
      <div class="image" style="min-height: 350px;">
         <img data-test-selector="imgProductImage" id="img-3494" class="img-responsive b-lazy b-loaded" 
         src="https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg">
       </div>
</div>"""

image = BeautifulSoup(element, "html.parser").find("img", class_="img-responsive b-lazy b-loaded")
if image is not None:
    print(image["src"])

Output：

https://api.scottycameron.com/Data/Media/Catalog/1/370/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE%20PLATE%20FRAME%20-%20SCOTTY%20CAMERON%20FINE%20MILLED%20PUTTERS.jpg

編輯：

根據您的評論，試試這個：

item = []
for ele in array:
    title = ele.find('h4', {'class': 'title'}).tex
    price = ele.find('span', {'data-test-selector': 'spanPrice'}).text
    img_src = ele.find('img', {'class': 'img-responsive b-lazy b-loaded'})
    if img_src is not None:
        item.extend([title, price, img_src["src"]])
    else:
        item.append([title, price, "No image source"])

Answer 2

使用這個

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('https://www.scottycameron.com/store/product/3494')
bs = BeautifulSoup(html, 'html.parser')
images = bs.find_all('img')
for img in images:
    if img.has_attr('src'):
        print(img['src'])

Output

/img/icon-header-user.png
/img/icon-header-cart.png
https://www.scottycameron.com/media/18299/puttertarchivenav_jan2021.jpg
https://www.scottycameron.com/media/18302/customizenav_jan2021.jpg
https://www.scottycameron.com/media/18503/showcasenav_2_2021.jpg
https://www.scottycameron.com/media/18454/2021phtmx_new_nws_thmb1.jpg
https://www.scottycameron.com/media/18301/aboutnav_jan2021_b.jpg
https://api.scottycameron.com/Data/Media/Catalog/1/1000/c09b7470-42dd-47e5-a244-9ef3d073c742LICENSE PLATE FRAME - SCOTTY CAMERON FINE MILLED PUTTERS.jpg
/store/content/images/loading.svg

將收集站點 url 中的所有圖像，從中我們可以進行進一步的處理。

如何使用 beautifulSoup 從網頁獲取圖像

問題描述

2 個解決方案

解決方案1
1 2021-03-05 06:50:40

解決方案2
0 2021-03-05 07:02:47

如何使用 beautifulSoup 從網頁獲取圖像

問題描述

2 個解決方案

解決方案1 1 2021-03-05 06:50:40

解決方案2 0 2021-03-05 07:02:47

解決方案1
1 2021-03-05 06:50:40

解決方案2
0 2021-03-05 07:02:47