简体   繁体   English

无法在 XML 中找到未关闭的元素

[英]Can't find unclosed element in XML

I have a large XML file (~18MB).我有一个很大的 XML 文件(~18MB)。 Apparently there is a tag somewhere in it that isn't closed.显然,其中某处有一个未关闭的标签。 I know this because when I ran the W3C markup validation tool (validator.w3.org), I get the following error:我知道这是因为当我运行 W3C 标记验证工具 (validator.w3.org) 时,我收到以下错误:

You may have neglected to close an element, or perhaps you meant to "self-close" an element, that is, ending it with "/>" instead of ">".

My question is how I might go about finding this missing closed element among the 500,000 lines in the file.我的问题是如何在文件的 500,000 行中找到这个丢失的封闭元素。 Is there a tool I could use that would suggest places where there might be a problem -- such as an element that has not been closed after a certain number of lines?是否有我可以使用的工具来建议可能存在问题的地方 - 例如在一定数量的行后尚未关闭的元素?

Any ideas would be much appreciated.任何想法将不胜感激。

I use Notepad++ which has an excellent XML Tools plugin that lets you check XML Syntax and takes you to the line that is problematic.我使用Notepad++ ,它有一个优秀的 XML 工具插件,可以让您检查 XML 语法并带您到有问题的行。 It also has useful utilities.它还具有有用的实用程序。

在此处输入图片说明

I just opened an XML file in VS 2010 (with ReSharper), broke the XML and what do you know?我刚刚在 VS 2010(使用 ReSharper)中打开了一个 XML 文件,破坏了 XML,你知道什么? The error was highlighted immediately.该错误立即突出显示。 If you have access to the same, it's that simple.如果您可以访问相同的内容,就这么简单。

xmllint is a standard tool for this. xmllint是一个标准工具。 From the Validation & DTDs page:验证和 DTD页面:

The simplest way is to use the xmllint program included with libxml.最简单的方法是使用 libxml 中包含的 xmllint 程序。 The --valid option turns-on validation of the files given as input. --valid 选项打开对作为输入给出的文件的验证。 For example the following validates a copy of the first revision of the XML 1.0 specification:例如,以下内容验证了 XML 1.0 规范第一修订版的副本:

xmllint --valid --noout test/valid/REC-xml-19980210.xml

the -- noout is used to disable output of the resulting tree. -- noout 用于禁用结果树的输出。

The --dtdvalid dtd allows validation of the document(s) against a given DTD. --dtdvalid dtd 允许根据给定的 DTD 验证文档。

Libxml2 exports an API to handle DTDs and validation, check the associated description. Libxml2 导出一个 API 来处理 DTD 和验证,检查相关的描述。

If your document isn't "pretty-printed" it can still be hard to find the offending node, so you might want to use xmllint to rewrite the file to be indented.如果您的文档不是“漂亮的”,仍然很难找到有问题的节点,因此您可能需要使用 xmllint 来重写要缩进的文件。

Since you do not have an XML Schema, there is no fool-proof way of finding the offending code, for example XML allows for recursive structures.由于您没有 XML 模式,因此没有找到有问题的代码的万无一失的方法,例如 XML 允许递归结构。 But you CAN write your own XML Schema, although that will potentially be a lot of stuff to learn.但是您可以编写自己的 XML 模式,尽管这可能需要学习很多东西。 Alternatively, I would create a simple, stupid, validator of the node level and the element name, as so:或者,我会创建一个简单的、愚蠢的、节点级别和元素名称的验证器,如下所示:

private void parseAndCheckStructure(XMLStreamReader reader) throws XMLStreamException {

    // first read header, this is probably not the offending element (?)
    int event = -1;
    while (reader.hasNext()) {
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){
            break;
        } else if (event == XMLStreamConstants.END_DOCUMENT) {
            throw new XMLStreamException();
        }
    }

    // read the rest of the document.
    int level = 1;
    do {
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){
            level++;
            String localName = reader.getLocalName();
            if(localName.equals("FirstElement")) {
                parseFirstElementWithALoopLikeTheCurrent(reader);

                level--;
            } else if(localName.equals("SecondElement")) {
                parseSecondElementWithALoopLikeTheCurrent(reader);

                level--;

            } else throw new RuntimeException("Unknown element " + localName + " at level " + level + " and location " + reader.getLocation());

        } else if(event == XMLStreamConstants.END_ELEMENT) {
            // keep track of level
            level--;
        }
    } while(level > 0);

}

Alternatively, parse the whole document within the above do-while loop, and do checks like或者,在上述 do-while 循环中解析整个文档,并进行类似检查

if(level == 4 && localName.equals("MyElement")) {
    // ok
} else {
    // throw exception with the location
}

It sucks, but it works.这很糟糕,但它有效。

尝试使用 chrome 浏览器打开 .xml 文件,它会指出故障的确切位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM