Jsoup在带有多个标签的标签后选择文本

Question

I want to extract a text after each text using jsoup . 我想使用jsoup在每个文本之后提取文本。 Is there any way to select it? 有什么办法可以选择吗？

Example code like below: 示例代码如下：

<div class="content">
<div name="panel-summary" id="summary">
    <p>
    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>
    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>
    <strong>C: </strong>*what ever text* *I dont want this*                         
        <strong>D: </strong>*anythinh text* *I want this*<br>
        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>
        <strong>F: </strong>*anythinh text* *I want this*<br>
    </p>

    <p>I want this</p>

and when it finish it creates auto id example id=123 完成后会创建自动id示例id = 123

Answer 1

If we can assume that all  elements which you want to find will always contain A: or D: or F: then with strong:matchesOwn(regex) (where regex will represent A:|D:|F: ) we can select those elements. 如果我们可以假设您要查找的所有元素始终包含A:或D:或F: strong:matchesOwn(regex)然后使用strong:matchesOwn(regex) （其中regex代表A:|D:|F: strong:matchesOwn(regex) ，我们可以选择那些元素。

After handling strong we can move on to second  and get its textual content via text() . 处理strong函数后，我们可以转到第二个并通过text()获得其文本内容。

String html = "<div class=\"content\">\n" +
        "<div name=\"panel-summary\" id=\"summary\">\n" +
        "    <p>\n" +
        "    <strong>A: </strong>*thank you* **I want to retrieve this text**<br>\n" +
        "    <strong>B: </strong>*Bla..bla* *I don't want this text*<br>\n" +
        "    <strong>C: </strong>*what ever text* *I dont want this*                         \n" +
        "        <strong>D: </strong>*anythinh text* *I want this*<br>\n" +
        "        <strong>E: </strong>*Bla..bla* *I don't want this text*t<br>\n" +
        "        <strong>F: </strong>*anythinh text* *I want this*<br>\n" +
        "    </p>\n" +
        "\n" +
        "    <p>I want this</p>";

Document doc = Jsoup.parse(html);
Elements pElements = doc.select("#summary p");
Elements strongElements = pElements.first().select("strong:matchesOwn(A:|D:|F:)");
for (Element strong : strongElements) {
    System.out.println(strong.nextSibling());//get next element, including textual element
}
System.out.println("---");
System.out.println(pElements.get(1).text());//textual content of <p>I want this</p>

Output: 输出：

*thank you* **I want to retrieve this text**
*anythinh text* *I want this*
*anythinh text* *I want this*
---
I want this

If you don't want to rely on content of  but simply on its indexes then pick all of them like 如果您不想依靠内容，而只是依靠它的索引，则选择所有它们，例如

Elements allStrElemens = doc.select("#summary p strong");

and simply pick ones you needed via their indexes (remember that indexes start from 0) like 并简单地通过它们的索引选择您需要的那些（请记住，索引从0开始），例如

System.out.println(allStrElemens.get(0).nextSibling());
System.out.println(allStrElemens.get(3).nextSibling());
System.out.println(allStrElemens.get(5).nextSibling());

Jsoup在带有多个标签的标签后选择文本

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-10-22 09:59:44

Jsoup在带有多个标签的标签后选择文本

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-10-22 09:59:44

解决方案1
0 已采纳 2018-10-22 09:59:44