如何在 Java 中解析 HTML 字符串？

Question

給定字符串"<table><tr><td>Hello World!</td></tr></table>" ，獲取表示它的DOM元素的（最簡單的）方法是什么？

Answer 1

如果你有一個包含 HTML 的字符串，你可以像這樣使用Jsoup庫來獲取 HTML 元素：

String htmlTable= "<table><tr><td>Hello World!</td></tr></table>";
Document doc = Jsoup.parse(htmlTable);

// then use something like this to get your element:
Elements tds = doc.getElementsByTag("td");

// tds will contain this one element: <td>Hello World!</td>

祝你好運！

Answer 2

這里有一個方法：

import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class HtmlParseDemo {
   public static void main(String [] args) throws Exception {
       Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
       HTMLEditorKit.Parser parser = new ParserDelegator();
       parser.parse(reader, new HTMLTableParser(), true);
       reader.close();
   }
}

class HTMLTableParser extends HTMLEditorKit.ParserCallback {

    private boolean encounteredATableRow = false;

    public void handleText(char[] data, int pos) {
        if(encounteredATableRow) System.out.println(new String(data));
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = true;
    }

    public void handleEndTag(HTML.Tag t, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = false;
    }
}

Answer 3

您可以使用 HTML Parser，它是一個 Java 庫，用於以線性或嵌套方式解析 HTML。 它是一個開源工具，可以在 SourceForge 上找到

Answer 4

你可以使用 Swing：

您如何利用內置於 Java 中的 HTML 處理功能？ 您可能不知道 Swing 包含解析 HTML 所需的所有類。 傑夫·希頓（Jeff Heaton）向您展示了方法。

Answer 5

我使用過Jericho HTML Parser它是 OSS，檢測（原諒）格式錯誤的標簽並且是輕量級的

Answer 6

我在某處找到了這個（不記得在哪里）：

 public static DocumentFragment parseXml(Document doc, String fragment)
 {
    // Wrap the fragment in an arbitrary element.
    fragment = "<fragment>"+fragment+"</fragment>";
    try
    {
        // Create a DOM builder and parse the fragment.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document d = factory.newDocumentBuilder().parse(
                new InputSource(new StringReader(fragment)));

        // Import the nodes of the new document into doc so that they
        // will be compatible with doc.
        Node node = doc.importNode(d.getDocumentElement(), true);

        // Create the document fragment node to hold the new nodes.
        DocumentFragment docfrag = doc.createDocumentFragment();

        // Move the nodes into the fragment.
        while (node.hasChildNodes())
        {
            docfrag.appendChild(node.removeChild(node.getFirstChild()));
        }
        // Return the fragment.
        return docfrag;
    }
    catch (SAXException e)
    {
        // A parsing error occurred; the XML input is not valid.
    }
    catch (ParserConfigurationException e)
    {
    }
    catch (IOException e)
    {
    }
    return null;
}

Answer 7

可以使用一些javax.swing.text.html實用程序類來解析 HTML。

import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
//...
try {
    String htmlString = "<html><head><title>Example Title</title></head><body>Some text...</body></html>";
    HTMLEditorKit htmlEditKit = new HTMLEditorKit();
    HTMLDocument htmlDocument = (HTMLDocument) htmlEditKit.createDefaultDocument();
    HTMLEditorKit.Parser parser = new ParserDelegator();
    parser.parse(new StringReader(htmlString),
            htmlDocument.getReader(0), true);
    // Use HTMLDocument here
    System.out.println(htmlDocument.getProperty("title")); // Example Title
} catch(IOException e){
    //Handle
    e.printStackTrace();
}

看：

如何在 Java 中解析 HTML 字符串？

問題描述

7 個解決方案

解決方案1
12 2015-04-08 19:39:11

解決方案2
10 2009-09-30 13:10:58

解決方案3
5 2009-09-30 13:03:13

解決方案4
3 2009-09-30 13:02:50

解決方案5
3 2009-09-30 13:10:07

解決方案6
3 已采納 2009-10-02 12:28:47

解決方案7
0 2021-04-13 20:02:57

如何在 Java 中解析 HTML 字符串？

問題描述

7 個解決方案

解決方案1 12 2015-04-08 19:39:11

解決方案2 10 2009-09-30 13:10:58

解決方案3 5 2009-09-30 13:03:13

解決方案4 3 2009-09-30 13:02:50

解決方案5 3 2009-09-30 13:10:07

解決方案6 3 已采納 2009-10-02 12:28:47

解決方案7 0 2021-04-13 20:02:57

解決方案1
12 2015-04-08 19:39:11

解決方案2
10 2009-09-30 13:10:58

解決方案3
5 2009-09-30 13:03:13

解決方案4
3 2009-09-30 13:02:50

解決方案5
3 2009-09-30 13:10:07

解決方案6
3 已采納 2009-10-02 12:28:47

解決方案7
0 2021-04-13 20:02:57