[英]How to extract text from HTML (after certain string)
I have the following HTML: 我有以下HTML:
<li class="group-ib medium-gap line-120 vertical-offset-10">
<i class="fa fa-angle-right font-bold font-95 text-primary text-dark">
::before
</i>
<span>
abc:
<b class="text-primary text-dark">st1</b>
</span>
</li>
And I want to extract str1
which always happens after abc
. 我想提取始终在
abc
之后发生的str1
。 I was able to do it by using the XPATH
link: 我能够通过使用
XPATH
链接来做到这一点:
xpath('.//b[@class = "text-primary text-dark"]')[0].text
But the solution depended on it being the first appearance of this particular class, which appears more than once and isn't always in the same order. 但是解决方案取决于它是该特定类的首次出现,该类不止一次出现并且并不总是以相同的顺序出现。 I was wondering if there was a way to search the
HTML
for abc
and pull the subsequent text? 我想知道是否有办法在
HTML
搜索abc
并提取后续文本?
Maybe find the element that contains abc
, navigate to child/parent if needed, get text. 也许找到包含
abc
的元素,如果需要,导航到子/父级,获取文本。
Example of selectors: 选择器示例:
Find any(* is for any tag) element that contains abc
text and select any child. 查找包含
abc
文本的any(*表示任何标签)元素,然后选择任何子级。
//*[contains(text(), 'abc')]/*
Find any(* is for any tag) element that contains abc
text and select his b
child. 查找包含
abc
文本的any(*表示任何标签)元素,然后选择其b
子元素。
//*[contains(text(), 'abc')]/b
Find li
element that has an element which contains text abc
and select b
element from inside it (inside li), use //
since b
is not first child of li
. 查找具有包含文本
abc
的元素的li
元素,然后从其内部(在li内部)选择b
元素,请使用//
因为b
不是li
第一个子元素。
//li[.//[contains(text(), 'abc')]]//b
If you know abc
then start from there, see what element is returned and if needed to navigate to parent/ancestor/child. 如果您知道
abc
则从此处开始,查看返回的元素,以及是否需要导航到父/祖/子。
For more about xpath please see w3schools xpath selectors 有关xpath的更多信息,请参见w3schools xpath选择器
The following xpath should give the text you are searching for 以下xpath应该提供您要搜索的文本
//*[contains(text(),'abc')]/*[@class='text-primary text-dark'][1]/text()
assuming the str1
you are looking for should always be under elements with attribute class=text-primary text-dark
假设您要查找的
str1
始终位于属性为class=text-primary text-dark
元素下
also assuming that you want to get the first such occurrence ( ignore the other text-primary text-dark
s )- that is why [1]
还假设您想获得第一个这样的出现(忽略其他
text-primary text-dark
s)-这就是为什么[1]
This xpath ensures that the node you are searching for those classes have a text abc
before searching them. 此xpath确保在搜索这些类之前,要搜索的节点的文本为
abc
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.