如何从HTML中提取文本（在特定字符串之后）

Question

I have the following HTML: 我有以下HTML：

<li class="group-ib medium-gap line-120 vertical-offset-10">
    <i class="fa fa-angle-right font-bold font-95 text-primary text-dark">
        ::before
    </i>
    <span>
        abc: 
        <b class="text-primary text-dark">st1</b>
    </span>
</li>

And I want to extract str1 which always happens after abc . 我想提取始终在abc之后发生的str1 。 I was able to do it by using the XPATH link: 我能够通过使用XPATH链接来做到这一点：

xpath('.//b[@class = "text-primary text-dark"]')[0].text

But the solution depended on it being the first appearance of this particular class, which appears more than once and isn't always in the same order. 但是解决方案取决于它是该特定类的首次出现，该类不止一次出现并且并不总是以相同的顺序出现。 I was wondering if there was a way to search the HTML for abc and pull the subsequent text? 我想知道是否有办法在HTML搜索abc并提取后续文本？

Answer 1

Maybe find the element that contains abc , navigate to child/parent if needed, get text. 也许找到包含abc的元素，如果需要，导航到子/父级，获取文本。
Example of selectors: 选择器示例：

Find any(* is for any tag) element that contains abc text and select any child. 查找包含abc文本的any（*表示任何标签）元素，然后选择任何子级。
//*[contains(text(), 'abc')]/*
Find any(* is for any tag) element that contains abc text and select his b child. 查找包含abc文本的any（*表示任何标签）元素，然后选择其b子元素。
//*[contains(text(), 'abc')]/b
Find li element that has an element which contains text abc and select b element from inside it (inside li), use // since b is not first child of li . 查找具有包含文本abc的元素的li元素，然后从其内部（在li内部）选择b元素，请使用//因为b不是li第一个子元素。
//li[.//[contains(text(), 'abc')]]//b

If you know abc then start from there, see what element is returned and if needed to navigate to parent/ancestor/child. 如果您知道abc则从此处开始，查看返回的元素，以及是否需要导航到父/祖/子。

For more about xpath please see w3schools xpath selectors 有关xpath的更多信息，请参见w3schools xpath选择器

Answer 2

The following xpath should give the text you are searching for 以下xpath应该提供您要搜索的文本

//*[contains(text(),'abc')]/*[@class='text-primary text-dark'][1]/text()

assuming the str1 you are looking for should always be under elements with attribute class=text-primary text-dark 假设您要查找的str1始终位于属性为class=text-primary text-dark元素下

also assuming that you want to get the first such occurrence ( ignore the other text-primary text-dark s )- that is why [1] 还假设您想获得第一个这样的出现（忽略其他text-primary text-dark s）-这就是为什么[1]

This xpath ensures that the node you are searching for those classes have a text abc before searching them. 此xpath确保在搜索这些类之前，要搜索的节点的文本为abc 。

如何从HTML中提取文本（在特定字符串之后）

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-11-08 20:48:10

解决方案2
0 2016-11-08 20:55:47

如何从HTML中提取文本（在特定字符串之后）

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-11-08 20:48:10

解决方案2 0 2016-11-08 20:55:47

解决方案1
1 已采纳 2016-11-08 20:48:10

解决方案2
0 2016-11-08 20:55:47