简体   繁体   中英

BeautifulSoup Scraper can't find text?AttributeError: ResultSet object has no attribute 'find_all'

super new to programming so sorry for any bad practices:

I was trying to make a web scraper that would scrape indeed.com for job listings in my field, and was following some articles on it online and I thought I understood it but now I think I've got a misunderstanding.

I'm attempting to scrape the location of the job which I found in the html as follows: html code

In order to scrape that location I was told to do as follows:

 grabbing location name
                c = div.find_all(name="span",attrs={"class":"location"})
                for span in c:
                    print(span.text)
                    job_post.append(span.text)

However I'm noting that sometimes the webpage loads it under div, not span, so I edited the code as follows:

 def find_location_for_job(self,div,job_post,city):
        div2 = div.find_all(name="div",attrs={"class":"sjcl"})
        print(div2)
        try:
            div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
            job_post.append(div3.text)
        except:
            span = div2.find_all(name="span",attrs={"class":"location accessible-contrast-color-location"})
            job_post.append(span.text)

        print(job_post)

However, half of the time it's still saying it can't find the text in the div/span, even when I search the posting and see it labeled as one or the other.

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Note that I left the code I had found because it doesn't capture the results when div is used instead of span. So my next troubleshooting step was to sorta combine my thoughts with theirs, which is as follows:

def find_location_for_job(self,div,job_post,city):
    div2 = div.find_all(name="div",attrs={"class":"sjcl"})
    try:
        div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
        for span in div3:
            job_post.append(span.text)
    except:
        div4 = div.findAll("span",attrs={"class":"location accessible-contrast-color-location"})
        for span in div4:
            job_post.append(span.text)

However this method throws the entire list of locations into every entry it scrapes (it scrapes 10 posting per city, so this method throws 10 locations into each of the 10 posting entries)

Can anyone tell me where I'm having the brain fart?

Edit: Full code in pastebin: https://pastebin.com/0LLb9ZcU

div2 is a ResultSet because when you use BeautifulSoup's find_all method that's what it returns. You need to iterate over the ResultSet and search for the inner fields like so:

def find_location_for_job(self, div, job_post, city): 
    div2 = div.find_all(name="div",attrs={"class":"sjcl"})
    for sjcl_div in div2:
        div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
        div4 = div.find_all("span",attrs={"class":"location accessible-contrast-color-location"})
        if div3:
            for span in div3:
                job_post.append(span.text)
        elif div4:
            for span in div4:
                job_post.append(span.text)
        else:
            print("Uh-oh, couldn't find the tags!")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM