Java Jsoup-未从Elements中删除Element

Question

I will start from beginning, there's html with pattern like this: 我将从头开始，这里有带有以下模式的html：

<div id="post_message_(some numeric id)">
    <div style="some style things">
        <div class="smallfont" style="some style">useless text</div>
        <table cellpading="6" cellspaceing=.......> a lot of text inside i dont need</table>
    </div>
    Text i need
</div>

those div's with styles and that table is optional, sometimes there's just 那些具有样式的div，并且该表是可选的，有时

<div id="post">
     Text i need
</div>

And i want to parse that text to String. 我想将文本解析为String。 Here;s the code I'm using 这是我正在使用的代码

Elements divsInside = element.getElementById("post_message_" + id).getElementsByTag("div");
    for(Element div : divsInside) {
        if(div != null && div.attr("style").equals("margin:20px; margin-top:5px; ")) {
            System.out.println(div.html());
            div.remove();
            System.out.println("div removed");
        }
    }

I added those print lines to check if it finds them and yes, it does find correct ones, but later when I'm parsing it to String: 我添加了这些打印行以检查是否找到它们，是的，它确实找到了正确的行，但是稍后当我将其解析为String时：

String message = Jsoup.parse(divsInside.html().replaceAll("(?i)<br[^>]*>", "br2n")).text()
            .replaceAll("br2n", "\n");

String contains all that removed stuff again for some reasons. 由于某些原因，字符串再次包含所有已删除的内容。

I tried removing them by iterators, or making full for and removing elements by indexes, buut the result is the same. 我尝试通过迭代器删除它们，或通过索引进行充分填充并删除元素，但结果却是相同的。

Answer 1

So you want to get Text i need . 所以你想得到Text i need 。 Use Element 's ownText() method which Gets the text owned by this element only; does not get the combined text of all children 使用Element的ownText()方法， Gets the text owned by this element only; does not get the combined text of all children方法仅Gets the text owned by this element only; does not get the combined text of all children Gets the text owned by this element only; does not get the combined text of all children . Gets the text owned by this element only; does not get the combined text of all children 。

 private static void test(String htmlFile) {
    File input = null;
    Document doc = null;
    Element specificIdDiv = null;

    try {
        input = new File(htmlFile);
        doc = Jsoup.parse(input, "ASCII", "");
        doc.outputSettings().charset("ASCII");
        doc.outputSettings().escapeMode(EscapeMode.base);

        /** Get Element id = post_message_1 **/
        specificIdDiv = doc.getElementById("post_message_1");

        if (specificIdDiv != null ) {
            System.out.println("content: " + specificIdDiv.ownText());
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Java Jsoup-未从Elements中删除Element

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-02-04 11:12:04

Java Jsoup-未从Elements中删除Element

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-02-04 11:12:04

解决方案1
1 已采纳 2015-02-04 11:12:04