简体   繁体   English

如何使用Python和Selenium在表中定位元素?

[英]How to locate elements in a table with Python and Selenium?

I am trying to use selenium to help retrieve data from a website that uses javascript to load the information. 我正在尝试使用硒来帮助从使用javascript加载信息的网站检索数据。

You can see the link here: Animal population 您可以在此处查看链接: 动物种群

The page shows some selectable fields, for my purpose I am trying to retrieve the data of population of Bees, in the United Kingdom for the year 2011. 该页面显示了一些可选字段,为达到我的目的,我试图检索2011年英国的蜜蜂种群数据。

Once the selectable fields are submitted the page will load a table with the correspondent data. 提交可选字段后,页面将加载包含相应数据的表。 I only want to get the Population and Density numbers for The Whole Country . 我只想获取整个国家人口密度数字。

My code so far only selects the year, country and species fields and after the table is returned it locates the 'Whole Country' field (feel free to advise me how to improve my existing code too). 到目前为止,我的代码仅选择年份,国家和物种字段,并且在返回表格后,它会找到“整个国家”字段(也可以随时建议我如何改进现有代码)。

I haven't been able to retrieve the population and density fields for the whole country, i have tried with xpath and 'following sibling' but it shows and exception to locate the elements. 我无法检索整个国家的人口和密度字段,我尝试使用xpath和“ following sibling”,但它显示了定位元素的异常。

I also don't want to rely on the position of the rows/cells since i will also try to get this information for the following years and the table fields will change position. 我也不想依赖行/单元格的位置,因为我还将尝试在接下来的几年中获取此信息,并且表格字段将更改位置。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get('https://www.oie.int/wahis_2/public/wahid.php/Countryinformation/Animalpopulation')



select = Select(driver.find_element_by_id('country6'))
select.select_by_value('GBR')
select = Select(driver.find_element_by_id('year'))
select.select_by_value('2011')

try:
    element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
    print element
    select = Select(driver.find_element_by_id('selected_species'))
    select.select_by_value('1')
except:
    print "Not found"

country_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]')

#population_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]/following-sibling::text()')
print country_td.text

Thank you for the help. 感谢您的帮助。

You need to go one level up in order to get the data using following-sibling 您需要上一步才能使用following-sibling获取数据

population = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[1]')
density = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[2]')

Or using country_td 或使用country_td

population = country_td.find_element(By.XPATH, ('/../following-sibling::td[1]')
density = country_td.find_element(By.XPATH, ('/../following-sibling::td[2]')

What following-sibling does in your example is looking for the next sibling of an element of type <b> . 在您的示例中, following-sibling正在寻找类型为<b>的元素的下一个同级。 What you want is an element of the type <td> . 您需要的是类型为<td>的元素。 But you can also use the parent element. 但是您也可以使用父元素。

The xpath for population 人口的xpath
//b[text()="The Whole Country"]/../../td[4]/b

Or 要么
//td/b[text()="The Whole Country"]/../following-sibling::td[1]/b

The xpath for density 密度的xpath
//b[text()="The Whole Country"]/../../td[5]/b

Or 要么
//td/b[text()="The Whole Country"]/../following-sibling::td[2]/b

Both kind of xpaths are working. 两种xpath都可以工作。 Using .. will lead your xpath to the parent element, which you need to do and than you can progress to either the sibling or locate the element by using td[X] . 使用..将xpath引导到父元素,这是您需要做的,然后可以使用td[X]前进到同级或找到该元素。 In this example you can also omit the last /b at each xpath. 在此示例中,您还可以省略每个xpath的最后一个/b

Note: this is really nasty, best practice is to always use unambiguous attributes to find an element. 注意:这确实很讨厌,最佳做法是始终使用明确的属性来查找元素。 However this isn't always possible as seen in this example. 但是,并非总是如本例所示。

Also, you should select Bees first and than wait for the table to be present, since the table gets reloaded between selecting year/country and selecting Bees, which could lead to inconsistent data. 另外,应该先选择Bees,然后再等待表出现,因为在选择年份/国家与选择Bees之间会重新加载表,这可能会导致数据不一致。

select = Select(driver.find_element_by_id('selected_species'))
select.select_by_value('1')
element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
print element

PS: There is a chrome extension called XPath Helper which you can use to test your xpaths on the website you are visiting. PS:有一个名为XPath Helper的chrome扩展程序,您可以使用该扩展程序来测试正在访问的网站上的xpath。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM