簡體   English   中英

從Java中的HTML代碼獲取標簽的文檔對象模型

[英]Getting Document Object Model of a tag from HTML code in java

有誰知道如何從html代碼獲取諸如按鈕之類的標簽的文檔對象模型(DOM)? 這就是我到目前為止所得到的...

JEditorPane p = new JEditorPane();
p.setContentType("text/html");
p.setText(" <!DOCTYPE html>\\n\n" +
"    <html dir=\"ltr\" lang=\"en\">\\n\n" +
"    <head>\\n\n" +
"    <meta http-equiv=\"Content-Type\" content=\"text/html; \" />\\n\n" +
"    <title>Alidoosti</title>\\n\n" +
"    </head>\\n\n" +
"    <body>\\n\n" +
"    <button id=\"miti\" type=\"submit\">Search</button>\n" +
"    </body>\\n\n" +
"    </html>\\n"); // Document text is provided below.
HTMLDocument d = (HTMLDocument) p.getDocument();


ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("js");
try {

engine.eval("function getDomPath(el) {\n" +
"  var stack = [];\n" +
"  while ( el.parentNode != null ) {\n" +
"    console.log(el.nodeName);\n" +
"    var sibCount = 0;\n" +
"    var sibIndex = 0;\n" +
"    for ( var i = 0; i < el.parentNode.childNodes.length; i++ ) {\n" +
"      var sib = el.parentNode.childNodes[i];\n" +
"      if ( sib.nodeName == el.nodeName ) {\n" +
"        if ( sib === el ) {\n" +
"          sibIndex = sibCount;\n" +
"        }\n" +
"        sibCount++;\n" +
"      }\n" +
"    }\n" +
"    if ( el.hasAttribute('id') && el.id != '' ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + '#' + el.id);\n" +
"    } else if ( sibCount > 1 ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + ':eq(' + sibIndex + ')');\n" +
"    } else {\n" +
"      stack.unshift(el.nodeName.toLowerCase());\n" +
"    }\n" +
"    el = el.parentNode;\n" +
"  }\n" +
"  return stack.slice(1); // removes the html element\n" +
"}"+
"var path = getDomPath("+d+".getElementById('miti'));\n" +
"console.log(path.join(' > '));");

但是我得到了這個錯誤:

javax.script.ScriptException: <eval>:26:60 Missing space after numeric literal
}var path = getDomPath(javax.swing.text.html.HTMLDocument@75f32542.getElementById('miti'));
                                                            ^ in <eval> at line number 26 at column number 60

是什么導致此錯誤?

變量d的類型為HTMLDocument,當與字符串連接時,將調用Java的方法toString() (對於您正在使用的當前類型),該javax.swing.text.html.HTMLDocument@75f32542將返回類似javax.swing.text.html.HTMLDocument@75f32542的字符串。

我想您更喜歡使用javascript對象document

請問你想做什么? 這似乎是一個非常復雜的方法,它只需要解析一些x(ht)ml...。

您可以為此使用Jsoup。 將其添加到您的類路徑並像這樣使用

String html = "[YOUR HTML IN HERE]"
Document doc = Jsoup.parse(html);
Elements buttons = doc.select("button");

之后,對於從select(“ button”)調用中檢索到的每個Element,您可以使用parent()方法獲取父堆棧。

在此處閱讀有關Jsoup的信息: https ://jsoup.org/

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM