简体   繁体   English

以跨浏览器方式使用 Javascript 的 DOMParser 时如何检测 XML 解析错误?

[英]How do I detect XML parsing errors when using Javascript's DOMParser in a cross-browser way?

It seems that all major browsers implement the DOMParser API so that XML can be parsed into a DOM and then queried using XPath, getElementsByTagName, etc...似乎所有主流浏览器都实现了 DOMParser API,以便可以将 XML 解析为 DOM,然后使用 XPath、getElementsByTagName 等进行查询......

However, detecting parsing errors seems to be trickier.但是,检测解析错误似乎更棘手。 DOMParser.prototype.parseFromString always returns a valid DOM. DOMParser.prototype.parseFromString总是返回一个有效的 DOM。 When a parsing error occurs, the returned DOM contains a <parsererror> element, but it's slightly different in each major browser.当发生解析错误时,返回的 DOM 包含一个<parsererror>元素,但在每个主流浏览器中略有不同。

Sample JavaScript:示例 JavaScript:

xmlText = '<root xmlns="http://default" xmlns:other="http://other"><child><otherr:grandchild/></child></root>';
parser = new DOMParser();
dom = parser.parseFromString(xmlText, 'application/xml');
console.log((new XMLSerializer()).serializeToString(dom));

Result in Opera: Opera 的结果:

DOM's root is a <parsererror> element. DOM 的根是<parsererror>元素。

<?xml version="1.0"?><parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">Error<sourcetext>Unknown source</sourcetext></parsererror>

Result in Firefox:结果在 Firefox 中:

DOM's root is a <parsererror> element. DOM 的根是<parsererror>元素。

<?xml-stylesheet href="chrome://global/locale/intl.css" type="text/css"?>
<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML Parsing Error: prefix not bound to a namespace
Location: http://fiddle.jshell.net/_display/
Line Number 1, Column 64:<sourcetext>&lt;root xmlns="http://default" xmlns:other="http://other"&gt;&lt;child&gt;&lt;otherr:grandchild/&gt;&lt;/child&gt;&lt;/root&gt;
---------------------------------------------------------------^</sourcetext></parsererror>

Result in Safari: Safari 中的结果:

The <root> element parses correctly but contains a nested <parsererror> in a different namespace than Opera and Firefox's <parsererror> element. <root>元素可以正确解析,但在与 Opera 和 Firefox 的<parsererror>元素不同的命名空间中包含嵌套的<parsererror> parsererror>。

<root xmlns="http://default" xmlns:other="http://other"><parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black"><h3>This page contains the following errors:</h3><div style="font-family:monospace;font-size:12px">error on line 1 at column 50: Namespace prefix otherr on grandchild is not defined
</div><h3>Below is a rendering of the page up to the first error.</h3></parsererror><child><otherr:grandchild/></child></root>

Am I missing a simple, cross-browser way of detecting if a parsing error occurred anywhere in the XML document?我是否缺少一种简单的跨浏览器检测 XML 文档中是否发生解析错误的方法? Or must I query the DOM for each of the possible <parsererror> elements that different browsers might generate?或者我必须为不同浏览器可能生成的每个可能的<parsererror>元素查询 DOM?

This is the best solution I've come up with.这是我想出的最好的解决方案。

I attempt to parse a string that is intentionally invalid XML and observe the namespace of the resulting <parsererror> element.我尝试解析一个故意无效的 XML 字符串并观察结果<parsererror>元素的命名空间。 Then, when parsing actual XML, I can use getElementsByTagNameNS to detect the same kind of <parsererror> element and throw a Javascript Error .然后,在解析实际 XML 时,我可以使用getElementsByTagNameNS来检测相同类型的<parsererror>元素并抛出 Javascript Error

// My function that parses a string into an XML DOM, throwing an Error if XML parsing fails
function parseXml(xmlString) {
    var parser = new DOMParser();
    // attempt to parse the passed-in xml
    var dom = parser.parseFromString(xmlString, 'application/xml');
    if(isParseError(dom)) {
        throw new Error('Error parsing XML');
    }
    return dom;
}

function isParseError(parsedDocument) {
    // parser and parsererrorNS could be cached on startup for efficiency
    var parser = new DOMParser(),
        errorneousParse = parser.parseFromString('<', 'application/xml'),
        parsererrorNS = errorneousParse.getElementsByTagName("parsererror")[0].namespaceURI;

    if (parsererrorNS === 'http://www.w3.org/1999/xhtml') {
        // In PhantomJS the parseerror element doesn't seem to have a special namespace, so we are just guessing here :(
        return parsedDocument.getElementsByTagName("parsererror").length > 0;
    }

    return parsedDocument.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0;
};

Note that this solution doesn't include the special-casing needed for Internet Explorer.请注意,此解决方案不包括 Internet Explorer 所需的特殊外壳。 However, things are much more straightforward in IE.然而,在 IE 中事情要简单得多。 XML is parsed with a loadXML method which returns true or false if parsing succeeded or failed, respectively. XML 使用loadXML方法进行解析,如果解析成功或失败,该方法分别返回 true 或 false。 See http://www.w3schools.com/xml/xml_parser.asp for an example.有关示例,请参见http://www.w3schools.com/xml/xml_parser.asp

When I came here the first time, I upvoted original answer (by cspotcode ), however, it does not work in Firefox.当我第一次来到这里时,我赞​​成原始答案(通过cspotcode ),但是,它在 Firefox 中不起作用。 The resulting namespace is always "null" because of the structure of the produced document.由于生成的文档的结构,生成的命名空间始终为“空”。 I made a little research (check the code here ).我做了一些研究(在这里查看代码)。 The idea is to use not这个想法是使用 not

invalidXml.childNodes[0].namespaceURI

but

invalidXml.getElementsByTagName("parsererror")[0].namespaceURI

And then select "parsererror" element by namespace as in original answer.然后按原始答案中的命名空间选择“parsererror”元素。 However, if you have a valid XML document with <parsererror> tag in same namespace as used by browser, you end up with false alarm.但是,如果您在与浏览器使用的命名空间相同的命名空间中有一个带有<parsererror>标记的有效 XML 文档,那么您最终会得到错误警报。 So, here's a heuristic to check if your XML parsed successfully:因此,这里有一个启发式方法来检查您的 XML 是否成功解析:

function tryParseXML(xmlString) {
    var parser = new DOMParser();
    var parsererrorNS = parser.parseFromString('INVALID', 'application/xml').getElementsByTagName("parsererror")[0].namespaceURI;
    var dom = parser.parseFromString(xmlString, 'application/xml');
    if(dom.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0) {
        throw new Error('Error parsing XML');
    }
    return dom;
}

Why not implement exceptions in DOMParser?为什么不在 DOMParser 中实现异常?

Interesting thing worth mentioning in current context: if you try to get XML file with XMLHttpRequest , parsed DOM will be stored in responseXML property, or null , if XML file content was invalid.在当前上下文中值得一提的有趣事情:如果您尝试使用XMLHttpRequest获取 XML 文件,则解析的 DOM 将存储在responseXML属性中,如果 XML 文件内容无效,则为null Not an exception, not parsererror or another specific indicator.不是例外,不是parsererror或其他特定指标。 Just null.只是空的。

In current browsers, the DOMParser appears to have two possible behaviours when given malformed XML:在当前的浏览器中,当给定格式错误的 XML 时,DOMParser 似乎有两种可能的行为:

  1. Discard the resulting document entirely — return a <parsererror> document with error details.完全丢弃生成的文档 - 返回带有错误详细信息的<parsererror>文档。 Firefox and Edge seem to always take this approach; Firefox 和 Edge 似乎总是采用这种方法; browsers from the Chrome family do this in most cases.大多数情况下,Chrome 系列的浏览器都会执行此操作。

  2. Return the resulting document with one extra <parsererror> inserted as the root element's first child.返回结果文档,其中插入一个额外的<parsererror>作为根元素的第一个子元素。 Chrome's parser does this in cases where it's able to produce a root element despite finding errors in the source XML.尽管在源 XML 中发现错误,但 Chrome 的解析器会在能够生成根元素的情况下执行此操作。 The inserted <parsererror> may or may not have a namespace.插入的<parsererror>可能有也可能没有命名空间。 The rest of the document seems to be left intact, including comments, etc. Refer to xml_errors.cc — search for XMLErrors::InsertErrorMessageBlock .文档的其余部分似乎保持不变,包括注释等。请参阅xml_errors.cc — 搜索XMLErrors::InsertErrorMessageBlock

For (1), the way to detect an error is to add a node to the source string, parse it, check whether the node exists in the resulting document, then remove it.对于(1),检测错误的方法是在源字符串中添加一个节点,解析它,检查结果文档中是否存在该节点,然后将其删除。 As far as I'm aware, the only way to achieve this without potentially affecting the result is to append a processing instruction or comment to the end of the source.据我所知,在不影响结果的情况下实现这一点的唯一方法是在源的末尾附加处理指令或注释。

Example:例子:

let key = `a`+Math.random().toString(32);

let doc = (new DOMParser).parseFromString(src+`<?${key}?>`, `application/xml`);

let lastNode = doc.lastChild;
if (!(lastNode instanceof ProcessingInstruction)
    || lastNode.target !== key
    || lastNode.data !== ``)
{
    /* the XML was malformed */
} else {
    /* the XML was well-formed */
    doc.removeChild(lastNode);
}

If case (2) occurs, the error won't be detected by the above technique, so another step is required.如果出现情况(2),则上述技术无法检测到错误,因此需要执行另一个步骤。

We can leverage the fact that only one <parsererror> is inserted, even if there are multiple errors found in different places within the source.我们可以利用仅插入一个<parsererror>的事实,即使在源中的不同位置发现了多个错误。 By parsing the source string again, by this time with a syntax error appended, we can ensure the (2) behaviour is triggered, then check whether the number of <parsererror> elements has changed — if not, the first parseFromString result already contained a true <parsererror> .通过再次解析源字符串,此时附加了一个语法错误,我们可以确保触发了(2)行为,然后检查<parsererror>元素的数量是否发生了变化——如果没有,第一个parseFromString结果已经包含了真<parsererror>

Example:例子:

let errCount = doc.documentElement.getElementsByTagName(`parsererror`).length;
if (errCount !== 0) {
    let doc2 = parser.parseFromString(src+`<?`, `application/xml`);
    if (doc2.documentElement.getElementsByTagName(`parsererror`).length === errCount) {
        /* the XML was malformed */
    }
}

I put together a test page to verify this approach: https://github.com/Cauterite/domparser-tests .我整理了一个测试页面来验证这种方法: https : //github.com/Cauterite/domparser-tests

It tests against the entire XML W3C Conformance Test Suite , plus a few extra samples to ensure it can distinguish documents containing <parsererror> elements from actual errors emitted by the DOMParser.它针对整个XML W3C 一致性测试套件进行测试,以及一些额外的示例,以确保它可以将包含<parsererror>元素的文档与 DOMParser 发出的实际错误区分开来。 Only a handful of test cases are excluded because they contain invalid unicode sequences.只有少数测试用例被排除在外,因为它们包含无效的 unicode 序列。

To be clear, it is only testing whether the result is identical to XMLHttpRequest.responseXML for a given document.明确地说,它只是测试结果是否与给定文档的XMLHttpRequest.responseXML相同。

You can run the tests yourself at https://cauterite.github.io/domparser-tests/index.html , but note that it uses ECMAScript 2018.您可以在https://cauterite.github.io/domparser-tests/index.html 上自己运行测试,但请注意,它使用 ECMAScript 2018。

At time of writing, all tests pass in recent versions of Firefox, Chrome, Safari and Firefox on Android.在撰写本文时,Android 上最新版本的 Firefox、Chrome、Safari 和 Firefox 中的所有测试均通过。 Edge and Presto-based Opera should pass since their DOMParsers appear to behave like Firefox's, and current Opera should pass since it's a fork of Chromium. Edge 和基于 Presto 的 Opera 应该通过,因为它们的 DOMParsers 表现得像 Firefox 的,而当前的 Opera 应该通过,因为它是 Chromium 的一个分支。


Please let me know if you can find any counter-examples or possible improvements.如果您能找到任何反例或可能的改进,请告诉我。

For the lazy, here's the complete function:对于懒人,这里是完整的功能:

const tryParseXml = function(src) {
    /* returns an XMLDocument, or null if `src` is malformed */

    let key = `a`+Math.random().toString(32);

    let parser = new DOMParser;

    let doc = null;
    try {
        doc = parser.parseFromString(
            src+`<?${key}?>`, `application/xml`);
    } catch (_) {}

    if (!(doc instanceof XMLDocument)) {
        return null;
    }

    let lastNode = doc.lastChild;
    if (!(lastNode instanceof ProcessingInstruction)
        || lastNode.target !== key
        || lastNode.data !== ``)
    {
        return null;
    }

    doc.removeChild(lastNode);

    let errElemCount =
        doc.documentElement.getElementsByTagName(`parsererror`).length;
    if (errElemCount !== 0) {
        let errDoc = null;
        try {
            errDoc = parser.parseFromString(
                src+`<?`, `application/xml`);
        } catch (_) {}

        if (!(errDoc instanceof XMLDocument)
            || errDoc.documentElement.getElementsByTagName(`parsererror`).length
                === errElemCount)
        {
            return null;
        }
    }

    return doc;
}

Coming back to this question in 2022, the documentation for the DOMParser.parseFromString() method offers a much simpler solution:回到 2022 年的这个问题, DOMParser.parseFromString()方法的文档提供了一个更简单的解决方案:

const parser = new DOMParser();

const xmlString = "<warning>Beware of the missing closing tag";
const doc = parser.parseFromString(xmlString, "application/xml");
const errorNode = doc.querySelector('parsererror');
if (errorNode) {
  // parsing failed
} else {
  // parsing succeeded
}

While the accepted answer worked for me, using the Document.querySelector() method is indeed much simpler because you don't have to determine the namespaceURI of the parsererror element.虽然接受的答案对我有用,但使用Document.querySelector()方法确实要简单得多,因为您不必确定parsererror元素的namespaceURI

It seems that all major browsers implement the DOMParser API so that XML can be parsed into a DOM and then queried using XPath, getElementsByTagName, etc...似乎所有主流浏览器都实现DOMParser API,以便可以将XML解析为DOM,然后使用XPath,getElementsByTagName等查询。

However, detecting parsing errors seems to be trickier.但是,检测解析错误似乎比较棘手。 DOMParser.prototype.parseFromString always returns a valid DOM. DOMParser.prototype.parseFromString始终返回有效的DOM。 When a parsing error occurs, the returned DOM contains a <parsererror> element, but it's slightly different in each major browser.发生解析错误时,返回的DOM包含一个<parsererror>元素,但是在每个主要的浏览器中它都略有不同。

Sample JavaScript:示例JavaScript:

xmlText = '<root xmlns="http://default" xmlns:other="http://other"><child><otherr:grandchild/></child></root>';
parser = new DOMParser();
dom = parser.parseFromString(xmlText, 'application/xml');
console.log((new XMLSerializer()).serializeToString(dom));

Result in Opera: Opera中的结果:

DOM's root is a <parsererror> element. DOM的根是<parsererror>元素。

<?xml version="1.0"?><parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">Error<sourcetext>Unknown source</sourcetext></parsererror>

Result in Firefox:结果在Firefox中:

DOM's root is a <parsererror> element. DOM的根是<parsererror>元素。

<?xml-stylesheet href="chrome://global/locale/intl.css" type="text/css"?>
<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML Parsing Error: prefix not bound to a namespace
Location: http://fiddle.jshell.net/_display/
Line Number 1, Column 64:<sourcetext>&lt;root xmlns="http://default" xmlns:other="http://other"&gt;&lt;child&gt;&lt;otherr:grandchild/&gt;&lt;/child&gt;&lt;/root&gt;
---------------------------------------------------------------^</sourcetext></parsererror>

Result in Safari:在Safari中的结果:

The <root> element parses correctly but contains a nested <parsererror> in a different namespace than Opera and Firefox's <parsererror> element.所述<root>元件正确地解析,但是包含嵌套<parsererror>在不同的命名空间比Opera和Firefox的<parsererror>元素。

<root xmlns="http://default" xmlns:other="http://other"><parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black"><h3>This page contains the following errors:</h3><div style="font-family:monospace;font-size:12px">error on line 1 at column 50: Namespace prefix otherr on grandchild is not defined
</div><h3>Below is a rendering of the page up to the first error.</h3></parsererror><child><otherr:grandchild/></child></root>

Am I missing a simple, cross-browser way of detecting if a parsing error occurred anywhere in the XML document?我是否缺少一种简单的跨浏览器方式来检测XML文档中是否存在解析错误? Or must I query the DOM for each of the possible <parsererror> elements that different browsers might generate?还是我必须查询DOM以获取不同浏览器可能生成的每个可能的<parsererror>元素?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 跨浏览器 Javascript XML 解析 - Cross-Browser Javascript XML Parsing 使用javascript history.back()在Safari中失败..如何让它跨浏览器? - Using javascript history.back() fails in Safari .. how do I make it cross-browser? 如何使用JavaScript以跨浏览器方式将DOM序列化为XML文本? - How do I serialize a DOM to XML text, using JavaScript, in a cross browser way? XML 解析而不使用 JavaScript 中的 DOMParser - XML parsing without using DOMParser in JavaScript 使用jquery的clone()时是否有跨浏览器忽略不透明度的方法? - Is there a cross-browser way to ignore opacity when using jquery's clone()? 带有命名空间的XML的跨浏览器Javascript解析器 - Cross-Browser Javascript Parser for XML with Namespace 有没有跨浏览器的方式我可以使用javascript打印后触发事件? - Is there a cross-browser way I can fire an event after printing using javascript? 如何将Javascript注入网站,然后确保它与跨浏览器兼容? - How can I inject Javascript into a site, then make sure it's cross-browser compatible? Javascript的tabIndex属性是跨浏览器吗? - Is Javascript's tabIndex property cross-browser? 跨浏览器问题:如何在装有Windows PC的safari中调试网站的行为? - Cross-browser issue: How do I debug my site's behavior on safari with window PC?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM