[英]How to extract just the number from html?
I am trying to extract on the number from this html element: 我试图从这个html元素中提取数字:
<td bgcolor="green">
<font color="white">
"49.8 "
<small>dBmV</small>
</font>
</td>
How do only extract the 49.8 without getting the bBmV also? 如何仅提取49.8而又不获取bBmV?
I am able to use the xpath on to return the all of 49.8 dbmv but when searching the xpath of just "49.8" I receive error 我可以使用xpath返回全部49.8 dbmv,但是搜索仅“ 49.8”的xpath时收到错误
Error: 错误:
invalid selector: The result of the xpath expression "/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()" is: [object Text]. It should be an element.
I have tried: 我努力了:
browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
which returns 49.8 dBmV 返回49.8 dBmV
And then: 接着:
browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/text()").text
returns the exception above. 返回上面的异常。
I just want the number 49.8 (which changes obviously). 我只想要数字49.8(明显改变)。 i know i could extract the number later but im hoping there something I can use to just to get the details directly from the html, something a bit tidier
我知道我以后可以提取数字,但我希望有一些我可以用来直接从html中获取详细信息的东西,有点小巧
To extract the text 49.8 you can use the following Locator Strategy : 要提取文本49.8 ,可以使用以下定位策略 :
Using xpath through execute_script()
and textContent : 通过
execute_script()
和textContent使用xpath :
print(driver.execute_script('return arguments[0].firstChild.textContent;', driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']")).strip())
Using xpath through splitlines()
and get_attribute()
: 通过
splitlines()
和get_attribute()
使用xpath :
print(driver.find_element_by_xpath("//td[@bgcolor='green']/font[@color='white']").get_attribute("innerHTML").splitlines()[1])
You can use the first line and just get the number like this: 您可以使用第一行,并获得如下所示的数字:
text_num = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
print(float(text_num.split()[0]))
Hope this helped! 希望这对您有所帮助!
You can replace
the extra text like this: 您可以这样
replace
多余的文本:
first_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font").text
second_text = browser.find_element_by_xpath("/html/body/p[1]/table/tbody/tr/td/table[2]/tbody/tr[2]/td[4]/font/small").text
only_first_text = first_text.replace(second_text, '')
Selenium中的find_element_by_xpath
API仅支持返回元素,因此即使在XPath中也可以指定一个表达式,该表达式仅返回您要查找的文本,在这种情况下,仅使用XPath是不可能的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.