簡體   English   中英

Java Jsoup-未從Elements中刪除Element

[英]Java Jsoup - Element isn't removed from Elements

我將從頭開始,這里有帶有以下模式的html:

<div id="post_message_(some numeric id)">
    <div style="some style things">
        <div class="smallfont" style="some style">useless text</div>
        <table cellpading="6" cellspaceing=.......> a lot of text inside i dont need</table>
    </div>
    Text i need
</div>

那些具有樣式的div,並且該表是可選的,有時

<div id="post">
     Text i need
</div>

我想將文本解析為String。 這是我正在使用的代碼

Elements divsInside = element.getElementById("post_message_" + id).getElementsByTag("div");
    for(Element div : divsInside) {
        if(div != null && div.attr("style").equals("margin:20px; margin-top:5px; ")) {
            System.out.println(div.html());
            div.remove();
            System.out.println("div removed");
        }
    }

我添加了這些打印行以檢查是否找到它們,是的,它確實找到了正確的行,但是稍后當我將其解析為String時:

String message = Jsoup.parse(divsInside.html().replaceAll("(?i)<br[^>]*>", "br2n")).text()
            .replaceAll("br2n", "\n");

由於某些原因,字符串再次包含所有已刪除的內容。

我嘗試通過迭代器刪除它們,或通過索引進行充分填充並刪除元素,但結果卻是相同的。

所以你想得到Text i need 使用ElementownText()方法, Gets the text owned by this element only; does not get the combined text of all children方法僅Gets the text owned by this element only; does not get the combined text of all children Gets the text owned by this element only; does not get the combined text of all children

 private static void test(String htmlFile) {
    File input = null;
    Document doc = null;
    Element specificIdDiv = null;

    try {
        input = new File(htmlFile);
        doc = Jsoup.parse(input, "ASCII", "");
        doc.outputSettings().charset("ASCII");
        doc.outputSettings().escapeMode(EscapeMode.base);

        /** Get Element id = post_message_1 **/
        specificIdDiv = doc.getElementById("post_message_1");

        if (specificIdDiv != null ) {
            System.out.println("content: " + specificIdDiv.ownText());
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM