[英]Extracting text between <p> tags, jsoup
Given this HTML: 鉴于此HTML:
<html>
<head></head>
<body>
<p>
"Text"
<br>
"Some more Text"
<br>
"Even more text"
</p>
</body>
</html>
I'm trying to get the text inside the <p>
tags with §Element description = document.select(______)`. 我正在尝试使用§Elementdescription = document.select(______)`在
<p>
标记内获取文本。 How can I get this text? 我如何获得此文字? I was able to do it with a page that didn't have a body but I'm not sure how to get past the body tags.
我可以使用没有正文的页面来完成此操作,但不确定如何通过正文标签。 Thanks.
谢谢。
Yo can use the selector: p
to extract all <p>
elements and use the element accessor: text()
to read the text from within each <p>
element. 您可以使用选择器:
p
提取所有<p>
元素,并使用元素访问器: text()
从每个<p>
元素中读取文本。
Here's an example using the HTML provided in your question: 这是使用问题中提供的HTML的示例:
@Test
public void canGetTextFromAPElement() {
String html = "<html> \n" +
" <head></head>\n" +
" <body>\n" +
" <p>\n" +
" \"Text\"\n" +
" <br>\n" +
" \"Some more Text\"\n" +
" <br> \n" +
" \"Even more text\"\n" +
" </p>\n" +
" </body>\n" +
"</html>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("p");
assertThat(elements.size(), is(1));
assertThat(elements.get(0).text(), is("\"Text\" \"Some more Text\" \"Even more text\""));
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.