简体   繁体   中英

Webscraping in Python Selenium - Can't find button

So, Im trying to access some data from this webpage http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm . Im trying to click on the button named as "Setor de atuação" with selenium. The problem is The requests lib is returning me a different HTML from the one I see when I inspect the page. I already tried to sent a header with my request but it wasn't the solution. Although, when I print the content in

browser.page_source

I still get an incomplete part of the page that I want. In order to try solving the problem Ive seen that two requests are posted when the site is initializes: print1

Well, I'm not sure what to do now. If anyone can help me or send me a tutorial, explain what is happening I would be really glad. Thanks in advance. Ive only done simple web-scraping so I'm not sure how to proceed, Ive also checked other questions in the forums and none seem to be similar to my problem.

import bs4 as bs
import requests
from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito') #private
#options.add_argument('--headless') # doesnt open page

browser = webdriver.Chrome('/home/itamar/Desktop/chromedriver', chrome_options=options)

site = 'http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm'

browser.get(site)

Thats my code till now. Im having trouble to find the and click in the element buttor "Setor de Atuação". Ive tried through X_path,class,id but nothing seems to work.

The aimed button is inside an iframe, in this case you'll have to use the switch_to function from your selenium driver, this way switching the driver to the iframe DOM, and only then you can look for the button. I've played with the page provided and it worked - only using Selenium though, no need of Beautiful Soup. This is my code:

from selenium import webdriver
import time

class B3:
    def __init__(self):
        self.bot = webdriver.Firefox()

    def start(self):
        bot = self.bot
        bot.get('http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/empresas-listadas.htm')
        time.sleep(2)

        iframe = bot.find_element_by_xpath('//iframe[@id="bvmf_iframe"]')
        bot.switch_to.frame(iframe)
        bot.implicitly_wait(30)

        tab = bot.find_element_by_xpath('//a[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]')
        time.sleep(3)
        tab.click()
        time.sleep(2)

if __name__ == "__main__":
    worker = B3()
    worker.start()

Hope it suits you well!

refs: https://www.techbeamers.com/switch-between-iframes-selenium-python/

In this case I suggest you to work only with the Selenium, because it depends on the Javascripts processing.

You can inspect the elements and use the XPath to choose and select the elements.

XPath : //*[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]/span/span

So your code will looks like:

elementSelect = driver.find_elements_by_xpath('//*[@id="ctl00_contentPlaceHolderConteudo_tabMenuEmpresaListada_tabSetor"]/span/span')
elementSelect[0].click()
time.sleep(5)  # Wait the page to load.

PS: I recommend you to search an API service for the B3. I found this link , but I didn't read it. Maybe they already have disponibilised these kind of data.

About XPath: https://www.guru99.com/xpath-selenium.html

I can't understand the problem, so if you can show a code snippet it would be better. And I suggest you to use BeautifulSoup for web scraping.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM