简体   繁体   English

Python 脚本用 Xpath 提取特定数据

[英]Python script to extract specific data with Xpath

I would like to extract all data of the row named "Nb B" at this url page: https://www.coteur.com/cotes-foot.php我想在此 url 页面中提取名为“Nb B”的行的所有数据: https://www.coteur.com/cotes-foot.ZE1BFD762321E409CEE4AC0B6E841963

Here is my python script:这是我的 python 脚本:

#!/usr/bin/python3
# -*- coding: utf­-8 ­-*-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

driver.get('https://www.coteur.com/cotes-foot.php')

#Store url associated with the soccer games
url_links = []
for i in driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]'):
    url_links.append(i.get_attribute('href'))

print(len(url_links), '\n')

nb_bookies = []
for i in driver.find_elements_by_xpath('//td[contains(@class, " odds")][contains(@style, "")]'):
    nb_bookies.append(i.text)
    
print(nb_bookies) 

And here is the output:这是 output:

25 

['1.80', '3.55', '4.70', '95%', '', '1.40', '4.60', '8.00', '94.33%', '', '2.35', '3.42', '2.63', '90.18%', '', '3.20', '3.60', '2.05', '92.19%', '', '7.00', '4.80', '1.35', '90.81%', '', '5.30', '4.30', '1.70', '99.05%', '', '2.15', '3.55', '3.65', '97.92%', '', '2.90', '3.20', '2.20', '88.81%', '', '3.95', '3.40', '2.10', '97.65%', '', '2.00', '3.80', '3.90', '98.04%', '', '2.40', '3.05', '3.50', '96.98%', '', '3.70', '3.20', '2.00', '91.72%', '', '2.75', '2.52', '3.05', '91.17%', '', '4.20', '3.05', '1.69', '84.23%', '', '1.22', '5.10', '10.00', '88.42%', '', '1.54', '4.60', '5.10', '93.72%', '', '3.00', '3.10', '2.45', '93.59%', '', '2.40', '3.50', '2.55', '90.55%', '', '1.76', '3.50', '4.20', '90.8%', '', '11.50', '5.30', '1.36', '98.91%', '', '3.00', '3.50', '2.20', '92.64%', '', '1.72', '3.42', '5.00', '92.62%', '', '1.08', '9.25', '19.00', '91.33%', '', '9.75', '5.75', '1.36', '98.82%', '', '5.70', '4.50', '1.63', '98.88%', '']

All the data of the table is extracted and you can see '' for the last row whereas I just want the last row.表格的所有数据都被提取出来,你可以看到最后一行的“”,而我只想要最后一行。

Your code is perfectly fine, the problem is to do with the window size that is spawned by the Automator in a headless mode.您的代码非常好,问题与 Automator 在headless模式下spawned的 window 大小有关。 The default window size and display size in headless mode is 800x600 on all platforms.在所有平台上,无头模式下的默认 window 大小和显示大小为800x600

The developers of the site have set the header to only appear if the width of the window is >1030px and only then the display: none;该站点的开发人员已将header设置为仅在 window 的宽度>1030px时才显示,然后才display: none; is removed from DOM .DOM中删除。 You can test this for yourself by shrinking & expanding the window size.您可以通过缩小和扩展 window 大小来自行测试。

You need to understand that if an element's attribute contains style="display: none;"您需要了解,如果元素的属性包含style="display: none;" which means the element is hidden then Selenium won't be able to interact with the element, ie if a user can't see it then the same behavior applies to selenium .这意味着元素被隐藏,然后 Selenium 将无法与元素交互,即如果用户看不到它,则相同的行为适用于selenium

Simply adding this line to enlarge your window in a headless mode will solve your problem.只需添加此行以在无头模式下放大您的 window 即可解决您的问题。

options.add_argument("window-size=1400,800")

To get the data from the last column only, fix your XPath accordingly:要仅从最后一列获取数据,请相应地修复您的 XPath:

nb_bookies = []
for i in driver.find_elements_by_xpath('//tr[@id and @role="row" ]/td[last()]'):
    nb_bookies.append(i.text)

Output: Output:

['12', '12', '1', '9', '11', '12', '12', '12', '12', '12', '11', '2', '11', '11', '9', '12', '11', '12', '12', '12', '12', '12', '10', '5', '12']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM