简体   繁体   English

如何从在 Python 中使用 react.js 和 Selenium 的网页抓取数据?

[英]How to scrape data from webpage which uses react.js with Selenium in Python?

I am facing some difficulties scraping a website which uses react.js and not sure why this is happening.我在抓取使用react.js的网站时遇到了一些困难,但不确定为什么会发生这种情况。

This is the html of the website:这是网站的html: 在此处输入图片说明

What I wish to do is click on the button with the class: play-pause-button btn btn -naked .我想要做的是点击class: play-pause-button btn btn -naked的按钮class: play-pause-button btn btn -naked However, when I load the page with the Mozilla gecko webdriver there is an exception thrown saying但是,当我使用 Mozilla gecko webdriver 加载页面时,会抛出异常说

Message: Unable to locate element: .play-pause-button btn btn-naked

which makes me think that maybe I should do something else to get this element?这让我觉得也许我应该做点其他事情来获得这个元素? This is my code so far:到目前为止,这是我的代码:

driver.get("https://drawittoknowit.com/course/neurological-system/anatomy/peripheral-nervous-system/1332/brachial-plexus---essentials")
    # execute script to scroll down the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    time.sleep(10)        
    soup = BeautifulSoup(driver.page_source, 'lxml')
    print(driver.page_source)
    play_button = driver.find_element_by_class_name("play-pause-button btn btn-naked").click()
    print(play_button)

Does anyone have an idea as to how I could go about solving this?有没有人知道我该如何解决这个问题? Any help is much appreciated任何帮助深表感谢

Seems you were close.看来你很接近了。 While using find_element_by_class_name() you can't pass multiple classes and you are allowed to pass only one classname , ie only only one among either of the following:使用find_element_by_class_name()不能传递多个类,并且只能传递一个classname ,即只能传递以下任一中的一个:

  • play-pause-button
  • btn
  • btn-naked

On passing multiple classes through find_element_by_class_name() you will face Message: invalid selector: Compound class names not permitted通过find_element_by_class_name()传递多个类时,您将面临消息:无效选择器:不允许复合类名


Solution解决方案

As an alternative, as the element is an Angular element, to click() on the element you have to induce WebDriverWait for the element_to_be_clickable() and you you can use either of the following Locator Strategies :作为替代方案,由于元素是Angular元素,要在元素上click() ,您必须为element_to_be_clickable()引入WebDriverWait ,您可以使用以下任一定位器策略

  • Using CSS_SELECTOR :使用CSS_SELECTOR

     WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.play-pause-button.btn.btn-naked")))click()
  • Using XPATH :使用XPATH

     WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='play-pause-button btn btn-naked']")))click()
  • Note : You have to add the following imports :注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法从React.js数据网格中抓取数据 - Unable to scrape data from a React.js data grid 如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据? - How do I scrape data using Selenium in Python from a webpage that adds div on scroll? 如何通过Selenium Python从JavaScript网页中抓取特定信息? - How to scrape specific information from javascript webpage by Selenium Python? 如何从 Selenium Python 中的按钮抓取数据 - How To Scrape Data From Button In Selenium Python 如何从锚标签中抓取数据,该标签位于 selenium python 中的另一个锚标签内 - How to scrape data from an anchor tag, which is inside another anchor tag in selenium python 如何抓取使用 javascript 的网页? - How to scrape a webpage that uses javascript? 使用 selenium python 右键单击网页后从 csv 抓取数据 - Scrape data from csv downloaded after right clicking on webpage using selenium python 如何使用 selenium 和 python 从 Highcharts 图表中抓取数据? - How to scrape data from Highcharts charts using selenium and python? 如何使用 Selenium 和 Python 从 Linkedin 页面抓取嵌套数据 - How to scrape the nested data from Linkedin page using Selenium and Python 如何从这个网页中抓取一个数字(在 python 中) - how to scrape a number from this webpage (in python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM