简体   繁体   English

JavaScript中的DOM解析

[英]DOM parsing in JavaScript

Some background: 一些背景:
I'm developing a web based mobile application using JavaScript. 我正在使用JavaScript开发基于Web的移动应用程序。 HTML rendering is Safari based. HTML呈现基于Safari。 Cross domain policy is disabled, so I can make calls to other domains using XmlHttpRequests. 跨域策略已禁用,因此我可以使用XmlHttpRequests调用其他域。 The idea is to parse external HTML and get text content of specific element. 这个想法是解析外部HTML并获取特定元素的文本内容。
In the past I was parsing the text line by line, finding the line I need. 过去,我逐行解析文本,找到我需要的行。 Then get the content of the tag which is a substring of that line. 然后获取标记的内容,该标记是该行的子字符串。 This is very troublesome and requires a lot of maintenance each time the target html changes. 这非常麻烦,并且每次目标html更改时都需要大量维护。
So now I want to parse the html text into DOM and run css or xpath queries on it. 所以现在我想将html文本解析为DOM并在其上运行css或xpath查询。
It works well: 它运作良好:

$('<div></div>').append(htmlBody).find('#theElementToFind').text()

The only problem is that when I use the browser to load html text into DOM element, it will try to load all external resources (images, js files, etc.). 唯一的问题是,当我使用浏览器将html文本加载到DOM元素中时,它将尝试加载所有外部资源(图像,js文件等)。 Although it isn't causing any serious problem, I would like to avoid that. 尽管这不会引起任何严重的问题,但我还是想避免这种情况。

Now the question: 现在的问题是:
How can I parse html text to DOM without the browser loading external resources, or run js scripts ? 如何在浏览器不加载外部资源或运行js脚本的情况下将html文本解析为DOM?
Some ideas I've been thinking about: 我一直在思考的一些想法:

  • creating new document object using createDocument call ( document.implementation.createDocument() ), but I'm not sure it will skip the loading of external resources. 使用createDocument调用( document.implementation.createDocument() )创建新的文档对象,但是我不确定它将跳过外部资源的加载。
  • use third party DOM parser in JS - the only one I've tried was very bad with handling errors 在JS中使用第三方DOM分析器-我尝试过的唯一一个在处理错误方面非常糟糕
  • use iframe to create new document, so that external resources with relative path will not throw an error in console 使用iframe创建新文档,以便具有相对路径的外部资源不会在控制台中引发错误

It seems that the following piece of code works great: 似乎以下代码很不错:

var doc = document.implementation.createHTMLDocument("");
doc.documentElement.innerHTML = htmlBody;
var text = $(doc).find('#theElementToFind').text();

external resources aren't loaded, scripts aren't being evaluated. 没有加载外部资源,没有评估脚本。

Found it here: https://stackoverflow.com/a/9251106/95624 在这里找到它: https : //stackoverflow.com/a/9251106/95624

Origin: https://developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers 来源: https : //developer.mozilla.org/en/DOMParser#DOMParser_HTML_extension_for_other_browsers

您可以构造任何html字符串的jQuery对象,而无需将其附加到DOM:

$(htmlBody).find('#theElementToFind').text();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM