Python Selenium 中的网页抓取 - 找不到按钮

Question

所以，我试图从这个网页http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm访问一些数据。 我试图用 selenium 点击名为“Setor de atuação”的按钮。 问题是请求库返回给我的 HTML 与我检查页面时看到的不同。 我已经尝试根据我的请求发送 header ，但这不是解决方案。 虽然，当我打印内容时

browser.page_source

我仍然得到我想要的页面的不完整部分。 为了尝试解决问题，我看到在网站初始化时发布了两个请求： print1

好吧，我不知道现在该怎么办。 如果有人可以帮助我或向我发送教程，请解释正在发生的事情，我会非常高兴。 提前致谢。 我只做了简单的网络抓取，所以我不确定如何进行，我还检查了论坛中的其他问题，似乎没有一个与我的问题相似。

import bs4 as bs
import requests
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito') #private
#options.add_argument('--headless') # doesnt open page

browser = webdriver.Chrome('/home/itamar/Desktop/chromedriver', chrome_options=options)

site = 'http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm'

browser.get(site)

到目前为止，这就是我的代码。 我无法找到并单击元素按钮“Setor de Atuação”。 我尝试过 X_path，class，id 但似乎没有任何效果。

Answer 1

The aimed button is inside an iframe, in this case you'll have to use the switch_to function from your selenium driver, this way switching the driver to the iframe DOM, and only then you can look for the button. 我已经玩过提供的页面并且它有效 - 虽然只使用 Selenium，但不需要 Beautiful Soup。 这是我的代码：

from selenium import webdriver
import time

class B3:
    def __init__(self):
        self.bot = webdriver.Firefox()

    def start(self):
        bot = self.bot
        bot.get('http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm')
        time.sleep(2)

        iframe = bot.find_element_by_xpath('//iframe[@id="bvmf_iframe"]')
        bot.switch_to.frame(iframe)
        bot.implicitly_wait(30)

        tab = bot.find_element_by_xpath('//a[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]')
        time.sleep(3)
        tab.click()
        time.sleep(2)

if __name__ == "__main__":
    worker = B3()
    worker.start()

希望它适合你！

参考： https://www.techbeamers.com/switch-between-iframes-selenium-python/

Answer 2

在这种情况下，我建议您仅使用 Selenium，因为它取决于 Javascripts 处理。

您可以检查元素并使用 XPath 和 select 选择元素。

XPath : //*[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]/span/span

所以你的代码看起来像：

elementSelect = driver.find_elements_by_xpath('//*[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]/span/span')
elementSelect[0].click()
time.sleep(5)  # Wait the page to load.

PS：我建议您搜索 B3 的 API 服务。 我找到了这个链接，但我没有阅读它。 也许他们已经对这些数据进行了拆分。

关于XPath： https://www.guru99.com/xpath-selenium.html

Answer 3

我无法理解这个问题，所以如果你能显示一个代码片段会更好。 我建议你使用BeautifulSoup进行 web 刮。

Python Selenium 中的网页抓取 - 找不到按钮

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-05-07 04:19:33

解决方案2
1 2020-05-07 02:30:13

解决方案3
0 2020-05-07 02:03:41

Python Selenium 中的网页抓取 - 找不到按钮

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-05-07 04:19:33

解决方案2 1 2020-05-07 02:30:13

解决方案3 0 2020-05-07 02:03:41

解决方案1
3 已采纳 2020-05-07 04:19:33

解决方案2
1 2020-05-07 02:30:13

解决方案3
0 2020-05-07 02:03:41