简体   繁体   中英

AttributeError: 'ResultSet' object has no attribute 'find_all' Beautifulsoup

I dont understand why do i get this error:

I have a fairly simple function:

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

Here is th estructure of webpage I am trying to scrape:

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

You are doing two things wrong:

  • You are calling find_all on the news result set; presumably you meant to call it on the links object, one element in that result set.

  • There are no <href ...> tags in your document, so searching with find_all('href') is not going to get you anything. You only have tags with an href attribute .

You could correct your code to:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

to do what I think you tried to do.

I'd use a CSS selector :

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

If you wanted to return the value of the href attribute (the link itself), you need to extract that too, of course:

return news_links[0]['href']

If you needed all the link objects, and not the first, simply return news_links for the link objects, or use a list comprehension to extract the URLs:

return [link['href'] for link in news_links]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM