[英]Jsoup select text after tag with many tag
I want to extract a text after each text using jsoup . 我想使用jsoup在每个文本之后提取文本。 Is there any way to select it?
有什么办法可以选择吗?
Example code like below: 示例代码如下:
<div class="content">
<div name="panel-summary" id="summary">
<p>
<strong>A: </strong>*thank you* **I want to retrieve this text**<br>
<strong>B: </strong>*Bla..bla* *I don't want this text*<br>
<strong>C: </strong>*what ever text* *I dont want this*
<strong>D: </strong>*anythinh text* *I want this*<br>
<strong>E: </strong>*Bla..bla* *I don't want this text*t<br>
<strong>F: </strong>*anythinh text* *I want this*<br>
</p>
<p>I want this</p>
and when it finish it creates auto id example id=123 完成后会创建自动id示例id = 123
If we can assume that all <strong>
elements which you want to find will always contain A:
or D:
or F:
then with strong:matchesOwn(regex)
(where regex will represent A:|D:|F:
) we can select those elements. 如果我们可以假设您要查找的所有
<strong>
元素始终包含A:
或D:
或F:
strong:matchesOwn(regex)
然后使用strong:matchesOwn(regex)
(其中regex代表A:|D:|F:
strong:matchesOwn(regex)
,我们可以选择那些元素。
After handling strong
we can move on to second <p>
and get its textual content via text()
. 处理
strong
函数后,我们可以转到第二个<p>
并通过text()
获得其文本内容。
String html = "<div class=\"content\">\n" +
"<div name=\"panel-summary\" id=\"summary\">\n" +
" <p>\n" +
" <strong>A: </strong>*thank you* **I want to retrieve this text**<br>\n" +
" <strong>B: </strong>*Bla..bla* *I don't want this text*<br>\n" +
" <strong>C: </strong>*what ever text* *I dont want this* \n" +
" <strong>D: </strong>*anythinh text* *I want this*<br>\n" +
" <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>\n" +
" <strong>F: </strong>*anythinh text* *I want this*<br>\n" +
" </p>\n" +
"\n" +
" <p>I want this</p>";
Document doc = Jsoup.parse(html);
Elements pElements = doc.select("#summary p");
Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");
for (Element strong : strongElements) {
System.out.println(strong.nextSibling());//get next element, including textual element
}
System.out.println("---");
System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>
Output: 输出:
*thank you* **I want to retrieve this text**
*anythinh text* *I want this*
*anythinh text* *I want this*
---
I want this
If you don't want to rely on content of <strong>
but simply on its indexes then pick all of them like 如果您不想依靠
<strong>
内容,而只是依靠它的索引,则选择所有它们,例如
Elements allStrElemens = doc.select("#summary p strong");
and simply pick ones you needed via their indexes (remember that indexes start from 0) like 并简单地通过它们的索引选择您需要的那些(请记住,索引从0开始),例如
System.out.println(allStrElemens.get(0).nextSibling());
System.out.println(allStrElemens.get(3).nextSibling());
System.out.println(allStrElemens.get(5).nextSibling());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.