I have an html page that has (among others) the following Divs:
<div id="fact">
<div class="fact">
AAAAAA
<div class="fact-label">
BBBBBB
</div>
</div>
</div>
I want to extract only the text of div which has class="fact"
Code:
Document page = Jsoup.connect(url).get();
Elements element = page.select("div.fact");
for (Element step : element) {
System.out.println(step.getElementsByAttributeValueMatching("class",
Pattern.compile("^[a-t]{4}$")));
}
but it does'nt work ,what I get is this:
<div class="fact">
AAAAAA
<div class="fact-label">
BBBBBB
</div>
</div>
My question is : how can I exclude the inner Div which has class="fact-label" ?
以下代码解决了该问题:
elem.select("div").remove().select("div.fact").text();
We can also use the following code to get the result, here we are asking the regex to match the word ending with "fact" (using $ to denote end of the string) and then extracting the "owntext()". owntext() will return only the text from this element only, it will not include texts from its children.
Elements el = doc.getElementsByAttributeValueMatching("class", "fact$");
for (Element ele : el){
System.out.println(ele.ownText());
}
Output: AAAAAA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.