简体   繁体   中英

How to extract text from HTML (after certain string)

I have the following HTML:

<li class="group-ib medium-gap line-120 vertical-offset-10">
    <i class="fa fa-angle-right font-bold font-95 text-primary text-dark">
        ::before
    </i>
    <span>
        abc: 
        <b class="text-primary text-dark">st1</b>
    </span>
</li>

And I want to extract str1 which always happens after abc . I was able to do it by using the XPATH link:

xpath('.//b[@class = "text-primary text-dark"]')[0].text 

But the solution depended on it being the first appearance of this particular class, which appears more than once and isn't always in the same order. I was wondering if there was a way to search the HTML for abc and pull the subsequent text?

Maybe find the element that contains abc , navigate to child/parent if needed, get text.
Example of selectors:

  1. Find any(* is for any tag) element that contains abc text and select any child.
    //*[contains(text(), 'abc')]/*

  2. Find any(* is for any tag) element that contains abc text and select his b child.
    //*[contains(text(), 'abc')]/b

  3. Find li element that has an element which contains text abc and select b element from inside it (inside li), use // since b is not first child of li .
    //li[.//[contains(text(), 'abc')]]//b

If you know abc then start from there, see what element is returned and if needed to navigate to parent/ancestor/child.

For more about xpath please see w3schools xpath selectors

The following xpath should give the text you are searching for

//*[contains(text(),'abc')]/*[@class='text-primary text-dark'][1]/text()

assuming the str1 you are looking for should always be under elements with attribute class=text-primary text-dark

also assuming that you want to get the first such occurrence ( ignore the other text-primary text-dark s )- that is why [1]

This xpath ensures that the node you are searching for those classes have a text abc before searching them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM