我怎样才能找到<img src>嵌套在<div>使用美丽的汤？</div><div id="text_translate"><p> Python 和 Beautiful Soup 的新手。我正在尝试收集插入电子商务网站可折叠部分的img的src 。包含图像的可折叠部分具有accordioncontents contents 的 class ，但插入可折叠部分的<img>没有特定的class 。并非每个页面都包含图像；有些包含多个。</p><p> 我正在尝试从img中提取随机嵌套在<div>中的src 。在下面的 HTML 示例中，我想要的 output 将是： <[https://example.com/image1.png]></p><pre> <div class="accordiontitle">Description</div> <div class="accordioncontents"> <p>Enjoy Daiya's Hon'y Mustard Dressing on your salads</p> </div> <div class="accordiontitle">Ingredients</div> <div class="accordioncontents"> <p>Non-GMO Expeller Pressed Canola Oil, Filtered Water</p> <p><strong>CONTAINS: MUSTARD</strong></p> </div> <div class="accordiontitle">Nutrition</div> <div class="accordioncontents"> <p> <img alt="" class="alignnone size-medium wp-image-57054" height="300" src="https://example.com/image1.png" width="162"/> </p> </div> <div class="accordiontitle">Warnings</div> <div class="accordioncontents"> <p><strong>Contains mustard</strong></p> </div></pre><p> 我编写了以下代码，成功深入到完整标签，但是一旦我在那里，我无法弄清楚如何提取src 。</p><pre> img_href = container.find_all(class_ ='accordion__contents') # generates the output above, in a list form img_href = [img.find_all('img') for img in img_href] for x in img_href: if len(x)==0: # skip over empty items in the list that don't have images continue else: print(x) # print to make sure the image is there x.find('img')[`src`] # generates error - see below</pre><p> 我得到的错误是ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? 我的意图<em>不是</em>将列表视为一个项目，因此是循环。我已经尝试find_all()与.attrs('src') ) 结合使用，但这也没有用。我究竟做错了什么？</p><p> 我已经简化了我的示例，但是我正在抓取的页面的 URL 在<a href="https://gtfoitsvegan.com/product/hony-mustard-dressing-by-daiya/?v=7516fd43adaa" rel="nofollow noreferrer">这里</a>。</p></div>

Question

New to both Python and Beautiful Soup. Python 和 Beautiful Soup 的新手。 I am trying to collect the src of an img inserted into a collapsible section on an e-commerce site.我正在尝试收集插入电子商务网站可折叠部分的img的src 。 The collapsible sections that contain the images have the class of accordion__contents , but <img> inserted into the collapsible sections do not have a specific class .包含图像的可折叠部分具有accordion__contents __contents 的 class ，但插入可折叠部分的<img>没有特定的class 。 Not every page contains an image;并非每个页面都包含图像； some contain multiple.有些包含多个。

I am trying to extract the src from img that are randomly nested within <div> .我正在尝试从img中提取随机嵌套在<div>中的src 。 In the HTML example below, my desired output would be: <[https://example.com/image1.png]>在下面的 HTML 示例中，我想要的 output 将是： <[https://example.com/image1.png]>

<div class="accordion__title">Description</div>    
<div class="accordion__contents">
       <p>Enjoy Daiya’s Hon’y Mustard Dressing on your salads</p>
    </div>
<div class="accordion__title">Ingredients</div>     
<div class="accordion__contents">
       <p>Non-GMO Expeller Pressed Canola Oil, Filtered Water</p>
       <p><strong>CONTAINS: MUSTARD</strong></p>
    </div>
<div class="accordion__title">Nutrition</div>     
<div class="accordion__contents">
       <p>
         <img alt="" class="alignnone size-medium wp-image-57054" height="300" src="https://example.com/image1.png" width="162"/>
       </p>
    </div>
<div class="accordion__title">Warnings</div>     
<div class="accordion__contents">
       <p><strong>Contains mustard</strong></p>
    </div>

I've written the following code that successfully drills down to the full tag, but I can't figure out how to extract src once I'm there.我编写了以下代码，成功深入到完整标签，但是一旦我在那里，我无法弄清楚如何提取src 。

  img_href = container.find_all(class_ ='accordion__contents') # generates the output above, in a list form
  img_href = [img.find_all('img') for img in img_href]  
  for x in img_href:
    if len(x)==0: # skip over empty items in the list that don't have images
      continue
    else:
      print(x) # print to make sure the image is there
      x.find('img')[`src`] # generates error - see below

The error I am getting is ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?我得到的错误是ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? My intent is not to be treating a list like an item, thus the loop.我的意图不是将列表视为一个项目，因此是循环。 I've tried find_all() combined with .attrs('src') but that also didn't work.我已经尝试find_all()与.attrs('src') ) 结合使用，但这也没有用。 What am I doing wrong?我究竟做错了什么？

I've simplified my example, but the URL for the page I'm scraping is here .我已经简化了我的示例，但是我正在抓取的页面的 URL 在这里。

Answer 1

You can use CSS selector ".accordion__contents img" :您可以使用 CSS 选择器".accordion__contents img" ：

import requests
from bs4 import BeautifulSoup


url = "https://gtfoitsvegan.com/product/hony-mustard-dressing-by-daiya/?v=7516fd43adaa"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_imgs = [img["src"] for img in soup.select(".accordion__contents img")]
print(all_imgs)

Prints:印刷：

['https://gtfoitsvegan.com/wp-content/uploads/2021/04/Daiya-Honey-Mustard-Nutrition-Facts-162x300.png']

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-27 22:50:08

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-27 22:50:08

解决方案1
1 已采纳 2021-04-27 22:50:08