How to scrape 'aria-label' with robobrowser

Question

I'm new to web scraping and currently using robobrowser to scrape a webpage. I'm trying to scrape the value of 'aria-label' under a certain class, but don't know how to do.

Here is my code.

from robobrowser import RoboBrowser
browser = RoboBrowser(history=True, parser='html.parser')
browser.open('https://www.scrapingwebsite.com')
links = browser.find_all(class_='searchResult__373c0__1yggB')
for link in links:
    print(link.find(class_='big_braket_class').text)
    problem_part = link.find(class_='subsidiary_class')
    print(problem_part.get('aria-label'))

It simply doesn't work. Is there any way to make it work? Thx

Answer 1

You could dump content from robobrowser into bs4. Then with bs4 4.7.1 use :has and :contains to target required items.

from bs4 import BeautifulSoup
#...your code
soup = browser.parsed
data = [(item.select_one('[class*=businessName]').text.replace('\xa0',''), item.select_one('[class*="i-stars"]')['aria-label']) for item in soup.select('li:has(h3:contains("All Results")) ~ li:has([class*=businessName])')]
print(data)

Sample of results:

How to scrape 'aria-label' with robobrowser

Question

1 answers

solution1
0 ACCPTED 2019-06-05 22:01:27

How to scrape 'aria-label' with robobrowser

Question

1 answers

solution1 0 ACCPTED 2019-06-05 22:01:27

solution1
0 ACCPTED 2019-06-05 22:01:27