如何用robobrowser刮掉'aria-label'

Question

I'm new to web scraping and currently using robobrowser to scrape a webpage. 我是网络抓取的新手，目前正在使用robobrowser来抓取网页。 I'm trying to scrape the value of 'aria-label' under a certain class, but don't know how to do. 我试图在某个班级下榨取'aria-label'的价值，但不知道该怎么办。

Here is my code. 这是我的代码。

from robobrowser import RoboBrowser
browser = RoboBrowser(history=True, parser='html.parser')
browser.open('https://www.scrapingwebsite.com')
links = browser.find_all(class_='searchResult__373c0__1yggB')
for link in links:
    print(link.find(class_='big_braket_class').text)
    problem_part = link.find(class_='subsidiary_class')
    print(problem_part.get('aria-label'))

It simply doesn't work. 它根本不起作用。 Is there any way to make it work? 有没有办法使它工作？ Thx 谢谢

Answer 1

You could dump content from robobrowser into bs4. 您可以将robobrowser中的内容转储到bs4中。 Then with bs4 4.7.1 use :has and :contains to target required items. 然后用bs4 4.7.1使用：has和：contains来定位所需的项目。

from bs4 import BeautifulSoup
#...your code
soup = browser.parsed
data = [(item.select_one('[class*=businessName]').text.replace('\xa0',''), item.select_one('[class*="i-stars"]')['aria-label']) for item in soup.select('li:has(h3:contains("All Results")) ~ li:has([class*=businessName])')]
print(data)

Sample of results: 结果样本：

如何用robobrowser刮掉'aria-label'

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-06-05 22:01:27

如何用robobrowser刮掉&#39;aria-label&#39;

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-06-05 22:01:27

如何用robobrowser刮掉'aria-label'

解决方案1
0 已采纳 2019-06-05 22:01:27