简体   繁体   English

从Java中的HTML代码获取标签的文档对象模型

[英]Getting Document Object Model of a tag from HTML code in java

Does anybody know how can get Document Object Model (DOM) of a tag like button from html Code? 有谁知道如何从html代码获取诸如按钮之类的标签的文档对象模型(DOM)? This is what i've got so far... 这就是我到目前为止所得到的...

JEditorPane p = new JEditorPane();
p.setContentType("text/html");
p.setText(" <!DOCTYPE html>\\n\n" +
"    <html dir=\"ltr\" lang=\"en\">\\n\n" +
"    <head>\\n\n" +
"    <meta http-equiv=\"Content-Type\" content=\"text/html; \" />\\n\n" +
"    <title>Alidoosti</title>\\n\n" +
"    </head>\\n\n" +
"    <body>\\n\n" +
"    <button id=\"miti\" type=\"submit\">Search</button>\n" +
"    </body>\\n\n" +
"    </html>\\n"); // Document text is provided below.
HTMLDocument d = (HTMLDocument) p.getDocument();


ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByName("js");
try {

engine.eval("function getDomPath(el) {\n" +
"  var stack = [];\n" +
"  while ( el.parentNode != null ) {\n" +
"    console.log(el.nodeName);\n" +
"    var sibCount = 0;\n" +
"    var sibIndex = 0;\n" +
"    for ( var i = 0; i < el.parentNode.childNodes.length; i++ ) {\n" +
"      var sib = el.parentNode.childNodes[i];\n" +
"      if ( sib.nodeName == el.nodeName ) {\n" +
"        if ( sib === el ) {\n" +
"          sibIndex = sibCount;\n" +
"        }\n" +
"        sibCount++;\n" +
"      }\n" +
"    }\n" +
"    if ( el.hasAttribute('id') && el.id != '' ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + '#' + el.id);\n" +
"    } else if ( sibCount > 1 ) {\n" +
"      stack.unshift(el.nodeName.toLowerCase() + ':eq(' + sibIndex + ')');\n" +
"    } else {\n" +
"      stack.unshift(el.nodeName.toLowerCase());\n" +
"    }\n" +
"    el = el.parentNode;\n" +
"  }\n" +
"  return stack.slice(1); // removes the html element\n" +
"}"+
"var path = getDomPath("+d+".getElementById('miti'));\n" +
"console.log(path.join(' > '));");

But I got this Error: 但是我得到了这个错误:

javax.script.ScriptException: <eval>:26:60 Missing space after numeric literal
}var path = getDomPath(javax.swing.text.html.HTMLDocument@75f32542.getElementById('miti'));
                                                            ^ in <eval> at line number 26 at column number 60

What is causing this error? 是什么导致此错误?

Your variable d is of type HTMLDocument, which, when concatenated with a string, invokes the java's method toString() which (for the current type you are using) returns a string like javax.swing.text.html.HTMLDocument@75f32542 . 变量d的类型为HTMLDocument,当与字符串连接时,将调用Java的方法toString() (对于您正在使用的当前类型),该javax.swing.text.html.HTMLDocument@75f32542将返回类似javax.swing.text.html.HTMLDocument@75f32542的字符串。

I guess you would prefer to use the javascript object document . 我想您更喜欢使用javascript对象document

May I ask what you are trying to do? 请问你想做什么? This seems an incredibly complicated way to do not much more than parsing some x(ht)ml.... 这似乎是一个非常复杂的方法,它只需要解析一些x(ht)ml...。

You can use Jsoup for this. 您可以为此使用Jsoup。 Add it to your classpath and use it like this 将其添加到您的类路径并像这样使用

String html = "[YOUR HTML IN HERE]"
Document doc = Jsoup.parse(html);
Elements buttons = doc.select("button");

Afterwhich for each Element retrieved from the select("button") call you can get the parent stack by using the parents() method 之后,对于从select(“ button”)调用中检索到的每个Element,您可以使用parent()方法获取父堆栈。

Read up on Jsoup here: https://jsoup.org/ 在此处阅读有关Jsoup的信息: https ://jsoup.org/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM