[英]how can I fetch outer div text only with JSoup?
I have the following html code: 我有以下html代码:
<div class="description">
<div class='daterange'>
Hello
<span itemprop='startDate'>
June 3, 2011
</span>
</div>
This is some description <i>that</i> I want to fetch
</div><br/>
and I want to extract only the part: 我只想提取一部分:
This is some description <i>that</i> I want to fetch
How can I do it with jsoup? 我如何用jsoup做到这一点?
I tried using String description = doc.select("div.description").text()
but then I'm getting all content that's inside. 我尝试使用
String description = doc.select("div.description").text()
但随后获取了其中的所有内容。
what you need is creating a String which will hold the words of the html file. 您需要创建一个字符串,其中将包含html文件的单词。 this is made by the following code, doc.body().text() is taking the text without all the html tags.
这是由以下代码完成的,doc.body()。text()接受没有所有html标记的文本。
`public String getWords(String url) {
String text = "";
try {
Document doc = Jsoup.connect(url).get();
text = doc.body().text();
} catch (IOException ioe) {
ioe.printStackTrace();
}
return text;
}
`
尝试这个
String description = doc.select("div").remove().first().html();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.