简体   繁体   English

无法使用Selenium获取网站中的表格元素

[英]Unable to get table element in website using Selenium

The website below has several tables, but my code is not being able to get a specific one (nor any other table). 下面的网站有几个表,但是我的代码无法获得特定的表(也没有其他任何表)。

The code aims to get data from table "Ações em Circulação no Mercado" -> one of the last tables from webpage. 该代码旨在从表“AçõesemCirculaçãono Mercado”中获取数据->该网页中的最后一个表。

I have tried the code below and some alternatives, but none worked for me: 我尝试了下面的代码和一些替代方法,但没有一个对我有用:

import pandas as pd
from selenium import webdriver
from time import sleep

url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2) #Wait webpage to load
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click();
sleep(2) #Wait webpage to load
browser.find_element_by_xpath(('//*[@id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_grdEmpresa_ctl01"]/tbody/tr/td[1]/a')).click();
sleep(5) #Wait webpage to load

#This is not working
content = browser.find_element_by_css_selector('//div[@id="div1"]')

#This is not working as well
#browser.find_element_by_xpath('//*[@id="div1"]/div/div/div[1]/table/tbody/tr[1]/td[1]').text

The Table and Full HTML can be found here: 表格和完整HTML可以在这里找到:

表

HTML

HTML is: HTML是:

<div id="div1">
                <div>
                    <h3>Ações em Circulação no Mercado</h3>
                    <div class="table-wrapper"><div class="scrollable"><table class="responsive">

                        <thead>
                            <tr>
                                <th colspan="3" class="text-center">19/04/2017</th>
                            </tr>
                            <tr>
                                <td>Tipos de Investidores / Ações</td>
                                <td class="text-center">Quantidade</td>
                                <td class="text-center">Percentual</td>
                            </tr>
                        </thead>

                            <tbody><tr>
                                <td>Pessoas Físicas</td>
                                <td class="text-right">108.853</td>
                                <td class="text-right"> - </td>
                            </tr>

                            <tr>
                                <td>Pessoas Jurídicas</td>
                                <td class="text-right">11.591</td>
                                <td class="text-right"> - </td>
                            </tr>

                            <tr>
                                <td>Investidores Institucionais</td>
                                <td class="text-right">1.039</td>
                                <td class="text-right"> - </td>
                            </tr>

                            <tr>
                                <td>Quantidade de Ações Ordinárias</td>
                                <td class="text-right">272.710.309</td>
                                <td class="text-right">8,21</td>
                            </tr>

                            <tr>
                                <td>Quantidade de Ações Preferenciais</td>
                                <td class="text-right">3.141.058.175</td>
                                <td class="text-right">97,23</td>
                            </tr>

                            <tr>
                                <td>Total de Ações</td>
                                <td class="text-right">3.413.768.484</td>
                                <td class="text-right">52,11</td>
                            </tr>

                            </tbody></table></div><div class="pinned"></div></div>
                </div>
                </div>

You wrote XPATH in CSS selector definition. 您在CSS选择器定义中编写了XPATH。 You should locate tables = browser.find_elements_by_css_selector('.responsive') if you want all tables, and then parse from them. 如果要获取所有表,则应找到tables = browser.find_elements_by_css_selector('.responsive') ,然后从中进行解析。 OR Use browser.find_element_by_xpath(.//*[@id='div1']/div/table) to locate exact table. 或使用browser.find_element_by_xpath(.//*[@id='div1']/div/table)查找确切的表。

One quick correction you can make is to change this content = browser.find_element_by_css_selector('//div[@id="div1"]') to content = browser.find_element_by_xpath('//div[@id="div1"]') because it actually is an xpath you're using. 您可以进行的一项快速更正是将content = browser.find_element_by_css_selector('//div[@id="div1"]')更改为content = browser.find_element_by_xpath('//div[@id="div1"]')因为它实际上是您正在使用的xpath。

The reason the second attempt is not working might be that the div1 element is not scrolled into view. 第二次尝试不起作用的原因可能是div1元素未滚动到视图中。 Selenium does not interact well with elements that are not visible. 硒与看不见的元素不能很好地相互作用。 So try this: 所以试试这个:

element = browser.find_element_by_xpath('//*[@id="div1"]')
# Force the element to be scrolled into view, even if you don't need its location.
location = element.location_once_scrolled_into_view
# Now Selenium can get its text.
text = element.text

To locate the WebElement and extract the text Pessoas Fisicas you can use the following line of code : 要找到WebElement并提取Pessoas Fisicas文本,可以使用以下代码行:

content = driver.find_element_by_xpath("//h3[.,'Ações em Circulação no Mercado']//following::div[1]//table[@class='responsive']//tr//following-sibling::td[1]").get_attribute("innerHTML")

Update (no code change) 更新(无代码更改)

The xpath expression : xpath表达式:

//h3[.,'Ações em Circulação no Mercado']//following::div[1]//table[@class='responsive']//tr//following-sibling::td[1]

Shouldn't be within single quotes eg 'xpath_here' . 不应在单引号内,例如'xpath_here' Put the xpression with in double quote eg "xpath_here" "xpath_here"放在双引号中,例如"xpath_here"

See the working snapshot : 查看工作快照:

TDS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM