简体   繁体   English

如何过滤从 HTML 表中提取的值

[英]How to filter values extracted from HTML table

I want to scrape stats and odds from betexplorer, in particular this site https://www.betexplorer.com/soccer/russia/premier-league-2019-2020/cska-moscow-fc-tambov/8Ya3mpOC/我想从 betexplorer 获取统计数据和赔率,特别是这个网站https://www.betexplorer.com/soccer/russia/premier-league-2019-2020/cska-moscow-fc-tambov/8Ya3mpOC/

homeodd = driver.find_element_by_xpath("//*[contains(@id,'aodds')]").text

I have this return我有这个回报

CLOSING ODDS
22.07. 17:48 1.28 +0.01
OPENING ODDS
17.07. 00:07 1.27

How could I scrape only the opening odds?我怎么能只刮开开盘赔率?

Try more specific XPath to get Opening Odds only:尝试更具体的 XPath 以获得仅开盘赔率:

xpath = '//table[starts-with(@id,"aodds")]//tr[th="Opening odds"]/following-sibling::tr'
homeodd = driver.find_element_by_xpath(xpath).text

If you want to ignore date and change values:如果您想忽略日期更改值:

xpath = '//table[starts-with(@id,"aodds")]//tr[th="Opening odds"]/following-sibling::tr/td[@class="bold"]'

An xpath such as: xpath 例如:

//tbody[contains(@id,'aodds')]/tr[4]

Seems to highlight the relevant row in the DOM:似乎突出显示了 DOM 中的相关行: 在此处输入图像描述

From that, you can look to find_elements and print out each child td .从那里,您可以查看 find_elements 并打印出每个子td (shout if you need support with this) (如果你需要这方面的支持,请大喊)

As an alternative, if you already have the whole text you can just manipulate your string.作为替代方案,如果您已经拥有整个文本,则可以操作您的字符串。 Assuming homeodd contains the text you post:假设homeodd包含您发布的文本:

print (homeodd.split('OPENING ODDS')[-1])

That will split around your required text and the [-1] takes the last item in the array - ie everything after opening odds.这将围绕您所需的文本进行拆分,并且[-1]采用数组中的最后一项 - 即开盘赔率之后的所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将从 salesforce 中提取的表格式化为 python? - How to format a table extracted from salesforce into python? 从带有正则表达式提取的值的日志中,无法正确插入sqlite表 - from log with regex extracted values fail correct insertion into sqlite table 通过 html 标记过滤提取的文本,使用 selenium python - Filter extracted text by html tag with selenium python 如何从Python中使用正则表达式提取的日期中删除垃圾值 - How to remove garbage values from dates extracted with regex in Python 如何在 yattag 中添加提取的 html? - How to add extracted html in yattag? 过滤器 IP 从 excel 中提取的文件 - Filter IP File extracted from excel Scrapy Filter从网页复制提取的URLS - Scrapy Filter duplicates extracted URLS from webpage 如何通过 python 字典中的值过滤 sqlalchemy 表 - How to filter sqlalchemy table by values from python dictionary 将 HTML 表中提取的文本分配给变量以备后用 -- Beautiful Soup / Python 3.7 - Assign Extracted text from HTML table to Variable for later use -- Beautiful Soup / Python 3.7 如何过滤提取的推文中的特定关键字? - How can filter for specific keywords in extracted tweets?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM