如何从通过XMLHttpRequest接收的html页面创建DOM对象？

Question

I'm developing a chromium extension so I have cross-host permissions for XMLHttpRequests for the domains I'm asking permissions for. 我正在开发一个chrome扩展，所以我对我要求权限的域的XMLHttpRequests有跨主机权限。

I have used XMLHttpRequest and got an HTML webpage (txt/html). 我使用了XMLHttpRequest并获得了一个HTML网页（txt / html）。 I want to use XPath (document.evaluate) to extract relevant bits from it. 我想使用XPath（document.evaluate）从中提取相关位。 Unfortunatly I'm failing to construct a DOM object from the returned string of the html. 不幸的是，我没有从返回的html字符串构造一个DOM对象。

var xhr = new XMLHttpRequest();
var name = escape("Sticks N Stones Cap");
xhr.open("GET", "http://items.jellyneo.net/?go=show_items&name="+name+"&name_type=exact", true);
xhr.onreadystatechange = function () {
    if (xhr.readyState == 4) {
    var parser = new DOMParser();
    var xmlDoc = parser.parseFromString(xhr.responseText,"text/xml");
    console.log(xmlDoc);
    }
}

xhr.send();

console.log is to display debug stuff in Chromium JS console. console.log是在Chromium JS控制台中显示调试内容。

In the said JS console. 在所说的JS控制台中。 I get this: 我明白了：

Document
<html>
<body>
<parsererror style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
<h3>This page contains the following errors:</h3>
<div style="font-family:monospace;font-size:12px">error on line 1 at column 60: Space required after the Public Identifier
</div>
<h3>Below is a rendering of the page up to the first error.</h3>
</parsererror>
</body>
</html>

So how am I suppose to use XMLHttpRequest -> receive HTML -> convert to DOM -> use XPath to transverse? 那么我怎么想使用XMLHttpRequest - >接收HTML - >转换为DOM - >使用XPath来横向？

Should I be using the "hidden" iframe hack for loading / receiving DOM object? 我应该使用“隐藏”iframe黑客来加载/接收DOM对象吗？

Answer 1

The DOMParser is choking on the DOCTYPE definition. DOMParser在DOCTYPE定义上窒息。 It would also error on any other non-xhtml markup such as a <link> without a closing / . 它也会在任何其他非xhtml标记上出错，例如没有关闭/的<link> 。 Do you have control over the document being sent? 您是否可以控制正在发送的文件？ If not, your best bet is to parse it as a string. 如果没有，最好的办法是将其解析为字符串。 Use regular expressions to find what you are looking for. 使用正则表达式查找您要查找的内容。

Edit: You can get the browser to parse the contents of the body for you by injecting it into a hidden div: 编辑：您可以通过将浏览器注入隐藏的div来让浏览器为您解析正文的内容：

var hidden = document.body.appendChild(document.createElement("div"));
hidden.style.display = "none";
hidden.innerHTML = /<body[^>]*>([\s\S]+)<\/body>/i(xhr.responseText)[1];

Now search inside hidden to find what you're looking for: 现在搜索hidden以找到您要查找的内容：

var myEl = hidden.querySelector("table.foo > tr > td.bar > span.fu");
var myVal = myEl.innerHTML;

如何从通过XMLHttpRequest接收的html页面创建DOM对象？

问题描述

1 个解决方案

解决方案1
3 2010-10-19 21:53:53

如何从通过XMLHttpRequest接收的html页面创建DOM对象？

问题描述

1 个解决方案

解决方案1 3 2010-10-19 21:53:53

解决方案1
3 2010-10-19 21:53:53