从Java中的HTML代码获取标签的文档对象模型

Question

Does anybody know how can get Document Object Model (DOM) of a tag like button from html Code? 有谁知道如何从html代码获取诸如按钮之类的标签的文档对象模型（DOM）？ This is what i've got so far... 这就是我到目前为止所得到的...

JEditorPane p = new JEditorPane();
p.setContentType("text/html");
p.setText(" <!DOCTYPE html>\\n\n" +
"    <html dir=\"ltr\" lang=\"en\">\\n\n" +
"    <head>\\n\n" +
"    <meta http-equiv=\"Content-Type\" content=\"text/html; \" />\\n\n" +
"    <title>Alidoosti</title>\\n\n" +
"    </head>\\n\n" +
"    <body>\\n\n" +
"    <button id=\"miti\" type=\"submit\">Search</button>\n" +
"    </body>\\n\n" +
"    </html>\\n"); // Document text is provided below.
HTMLDocument d = (HTMLDocument) p.getDocument();


ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("js");
try {

engine.eval("function getDomPath(el) {\n" +
"  var stack = [];\n" +
"  while ( el.parentNode != null ) {\n" +
"    console.log(el.nodeName);\n" +
"    var sibCount = 0;\n" +
"    var sibIndex = 0;\n" +
"    for ( var i = 0; i < el.parentNode.childNodes.length; i++ ) {\n" +
"      var sib = el.parentNode.childNodes[i];\n" +
"      if ( sib.nodeName == el.nodeName ) {\n" +
"        if ( sib === el ) {\n" +
"          sibIndex = sibCount;\n" +
"        }\n" +
"        sibCount++;\n" +
"      }\n" +
"    }\n" +
"    if ( el.hasAttribute('id') && el.id != '' ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + '#' + el.id);\n" +
"    } else if ( sibCount > 1 ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + ':eq(' + sibIndex + ')');\n" +
"    } else {\n" +
"      stack.unshift(el.nodeName.toLowerCase());\n" +
"    }\n" +
"    el = el.parentNode;\n" +
"  }\n" +
"  return stack.slice(1); // removes the html element\n" +
"}"+
"var path = getDomPath("+d+".getElementById('miti'));\n" +
"console.log(path.join(' > '));");

But I got this Error: 但是我得到了这个错误：

javax.script.ScriptException: <eval>:26:60 Missing space after numeric literal
}var path = getDomPath(javax.swing.text.html.HTMLDocument@75f32542.getElementById('miti'));
                                                            ^ in <eval> at line number 26 at column number 60

What is causing this error? 是什么导致此错误？

Answer 1

Your variable d is of type HTMLDocument, which, when concatenated with a string, invokes the java's method toString() which (for the current type you are using) returns a string like javax.swing.text.html.HTMLDocument@75f32542 . 变量d的类型为HTMLDocument，当与字符串连接时，将调用Java的方法toString() （对于您正在使用的当前类型），该javax.swing.text.html.HTMLDocument@75f32542将返回类似javax.swing.text.html.HTMLDocument@75f32542的字符串。

I guess you would prefer to use the javascript object document . 我想您更喜欢使用javascript对象document 。

May I ask what you are trying to do? 请问你想做什么？ This seems an incredibly complicated way to do not much more than parsing some x(ht)ml.... 这似乎是一个非常复杂的方法，它只需要解析一些x（ht）ml...。

Answer 2

You can use Jsoup for this. 您可以为此使用Jsoup。 Add it to your classpath and use it like this 将其添加到您的类路径并像这样使用

String html = "[YOUR HTML IN HERE]"
Document doc = Jsoup.parse(html);
Elements buttons = doc.select("button");

Afterwhich for each Element retrieved from the select("button") call you can get the parent stack by using the parents() method 之后，对于从select（“ button”）调用中检索到的每个Element，您可以使用parent（）方法获取父堆栈。

Read up on Jsoup here: https://jsoup.org/ 在此处阅读有关Jsoup的信息： https ：//jsoup.org/

从Java中的HTML代码获取标签的文档对象模型

问题描述

2 个解决方案

解决方案1
0 2017-12-28 14:29:44

解决方案2
0 已采纳 2017-12-28 16:01:19

从Java中的HTML代码获取标签的文档对象模型

问题描述

2 个解决方案

解决方案1 0 2017-12-28 14:29:44

解决方案2 0 已采纳 2017-12-28 16:01:19

解决方案1
0 2017-12-28 14:29:44

解决方案2
0 已采纳 2017-12-28 16:01:19