简体   繁体   中英

Jsoup getElementsByAttributeValueMatching doesn't function

I have an html page that has (among others) the following Divs:

<div id="fact">
    <div class="fact">
       AAAAAA
     <div class="fact-label">
         BBBBBB
     </div> 
    </div>
 </div>

I want to extract only the text of div which has class="fact"

Code:

Document page = Jsoup.connect(url).get();
        Elements element = page.select("div.fact"); 
        for (Element step : element) {

 System.out.println(step.getElementsByAttributeValueMatching("class", 
 Pattern.compile("^[a-t]{4}$")));
}

but it does'nt work ,what I get is this:

<div class="fact">
    AAAAAA
   <div class="fact-label">
    BBBBBB
    </div> 
</div>

My question is : how can I exclude the inner Div which has class="fact-label" ?

以下代码解决了该问题:

elem.select("div").remove().select("div.fact").text();

We can also use the following code to get the result, here we are asking the regex to match the word ending with "fact" (using $ to denote end of the string) and then extracting the "owntext()". owntext() will return only the text from this element only, it will not include texts from its children.

Elements el = doc.getElementsByAttributeValueMatching("class", "fact$");

    for (Element ele : el){
        System.out.println(ele.ownText());
    }

Output: AAAAAA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM