提取文字之间 标签，jsoup

Question

Given this HTML: 鉴于此HTML：

<html> 
   <head></head>
   <body>
      <p>
        "Text"
        <br>
        "Some more Text"
        <br> 
        "Even more text"
        </p>
  </body>
</html>

I'm trying to get the text inside the  tags with §Element description = document.select(______)`. 我正在尝试使用§Elementdescription = document.select（______）`在标记内获取文本。 How can I get this text? 我如何获得此文字？ I was able to do it with a page that didn't have a body but I'm not sure how to get past the body tags. 我可以使用没有正文的页面来完成此操作，但不确定如何通过正文标签。 Thanks. 谢谢。

Answer 1

Yo can use the selector: p to extract all  elements and use the element accessor: text() to read the text from within each  element. 您可以使用选择器： p提取所有元素，并使用元素访问器： text()从每个元素中读取文本。

Here's an example using the HTML provided in your question: 这是使用问题中提供的HTML的示例：

@Test
public void canGetTextFromAPElement() {
    String html = "<html> \n" +
            "   <head></head>\n" +
            "   <body>\n" +
            "      <p>\n" +
            "        \"Text\"\n" +
            "        <br>\n" +
            "        \"Some more Text\"\n" +
            "        <br> \n" +
            "        \"Even more text\"\n" +
            "        </p>\n" +
            "  </body>\n" +
            "</html>";

    Document doc = Jsoup.parse(html);

    Elements elements = doc.select("p");

    assertThat(elements.size(), is(1));
    assertThat(elements.get(0).text(), is("\"Text\" \"Some more Text\" \"Even more text\""));
}

提取文字之间 <p> 标签，jsoup

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-11-12 09:46:37

提取文字之间 <p> 标签，jsoup

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-11-12 09:46:37

解决方案1
0 已采纳 2017-11-12 09:46:37