简体   繁体   English

如何仅使用JSoup提取外部div文本?

[英]how can I fetch outer div text only with JSoup?

I have the following html code: 我有以下html代码:

<div class="description">
    <div class='daterange'>
        Hello 
     <span itemprop='startDate'>
        June 3, 2011
     </span>
    </div>
    This is some description <i>that</i> I want to fetch
 </div><br/>

and I want to extract only the part: 我只想提取一部分:

This is some description <i>that</i> I want to fetch

How can I do it with jsoup? 我如何用jsoup做到这一点?

I tried using String description = doc.select("div.description").text() but then I'm getting all content that's inside. 我尝试使用String description = doc.select("div.description").text()但随后获取了其中的所有内容。

what you need is creating a String which will hold the words of the html file. 您需要创建一个字符串,其中将包含html文件的单词。 this is made by the following code, doc.body().text() is taking the text without all the html tags. 这是由以下代码完成的,doc.body()。text()接受没有所有html标记的文本。

`public String getWords(String url) {
        String text = "";
        try {
            Document doc = Jsoup.connect(url).get();
            text = doc.body().text();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
        return text;
    }
`

尝试这个

String description = doc.select("div").remove().first().html();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM