简体   繁体   English

正则表达式缩进XML文件

[英]Regex to Indent an XML File

Is it possible to write a REGEX (search replace) that when run on an XML string will output that XML string indented nicely? 是否可以编写一个REGEX(搜索替换),当在XML字符串上运行时,输出的XML字符串会很好地缩进?

If so whats the REGEX :) 如果是这样的话REGEX :)

Doing this would be far, far simpler if you didn't use a regex. 如果你不使用正则表达式,那么这样做会更简单。 In fact I'm not even sure it's possible with regex. 事实上,我甚至不确定正则表达式是否可行。

Most languages have XML libraries that would make this task very simple. 大多数语言都有XML库,可以使这项任务变得非常简单。 What language are you using? 你用的是什么语言?

Is it possible to write a REGEX (search replace) that when run on an XML string [...anything] 是否可以编写一个REGEX(搜索替换),当在XML字符串[...任何]上运行时

No. 没有。

Use an XML parser to read the string, then an XML serialiser to write it back out in 'pretty' mode. 使用XML解析器读取字符串,然后使用XML序列化器以“漂亮”模式将其写回。

Each XML processor has its own options so it depends on platform, but here is the somewhat long-winded way that works on DOM Level 3 LS-compliant implementations: 每个XML处理器都有自己的选项,因此它依赖于平台,但这里有一些冗长的方式适用于符合DOM Level 3 LS的实现:

input= implementation.createLSInput();
input.stringData= unprettyxml;
parser= implementation.createLSParser(implementation.MODE_SYNCHRONOUS, null);
document= parser.parse(input);
serializer= implementation.createLSSerializer();
serializer.domConfig.setParameter("format-pretty-print", true);
prettyxml= serializer.writeToString(document);

I don't know if a regex, in isolation, could do a pretty-print format of an arbitrary XML input. 我不知道单独的正则表达式是否可以执行任意XML输入的漂亮打印格式。 You would need a regex being applied by a program to find a tag, locate the matching closing tags (if the tag is not self-closed), and so on. 您需要程序应用正则表达式来查找标记,找到匹配的结束标记(如果标记不是自我关闭的),依此类推。 Using regex to solve this problem is really using the wrong tool for the job. 使用正则表达式解决这个问题实际上是使用错误的工具来完成工作。 The simplest possible way to pretty print XML is to use an XML parser, read it in, set appropriate serialization options, and then serialize the XML back out. 简单地打印XML的最简单方法是使用XML解析器,读取它,设置适当的序列化选项,然后将XML序列化。

Why do you want to use regex to solve this problem? 为什么要使用正则表达式来解决这个问题?

Using a regex for this will be a nightmare. 使用正则表达式将是一场噩梦。 Keeping track of the indentation level based on the hierarchy of the nodes will be almost impossible. 基于节点的层次结构跟踪缩进级别几乎是不可能的。 Perhaps perl's 5.10 regular expression engine might help since it's now reentrant. 或许perl的5.10正则表达式引擎可能有所帮助,因为它现在可以重入。 But let's not go into that road... Besides you will need to take into account CDATA sections which can embed XML declarations that need to be ignored by the indentation and preserved intact. 但是,我们不要走这条路......除此之外,您还需要考虑CDATA部分,这些部分可以嵌入需要被缩进忽略的XML声明并保存完好。

Stick with DOM. 坚持使用DOM。 As it was suggested in the other answer, some libraries provide already a function that will indent a DOM tree for you. 正如在另一个答案中所建议的那样,一些库已经提供了一个将为您缩进DOM树的函数。 If not building one will be much simplier than creating and maintaining the regexes that will do the same task. 如果不构建一个将比创建和维护将执行相同任务的正则表达式简化得多。

The dark voodoo regexp as described here works great. 这里描述的黑暗伏都教regexp效果很好。
http://www.perlmonks.org/?node_id=261292 http://www.perlmonks.org/?node_id=261292
Its main advantage against using XML::LibXMl and others is that it's an order of magnitude faster. 它反对使用XML :: LibXMl和其他的主要优点是它的速度提高了一个数量级。

From this link : 这个链接

  private static Regex indentingRegex=new Regex(@"\<\s*(?<tag>[\w\-]+)(\s+[\w\-]+\s*=\s*""[^""]*""|'[^']*')*\s*\>[^\<]*\<\s*/\s*\k<tag>\s*\>|\<[!\?]((?<=!)--((?!--\>).)*--\>|(""[^""]*""|'[^']'|[^>])*\>)|\<\s*(?<closing>/)?\s*[\w\-]+(\s+[\w\-]+\s*=\s*""[^""]*""|'[^']*')*\s*((/\s*)|(?<opening>))\>|[^\<]*", RegexOptions.ExplicitCapture|RegexOptions.Singleline);

  public static string IndentXml(string xml) {
        StringBuilder result=new StringBuilder(xml.Length*2);
        int indent=0;
        for (Match match=indentingRegex.Match(xml); match.Success; match=match.NextMatch()) {
              if (match.Groups["closing"].Success)
                    indent--;
              result.AppendFormat("{0}{1}\r\n", new String(' ', indent*2), match.Value);
              if (match.Groups["opening"].Success&&(!match.Groups["closing"].Success))
                    indent++;
        }
        return result.ToString();
  }

This would only be acheivable with multiple regexs, which will perform like a state machine. 这只能通过多个正则表达式来实现,这些正则表达式将像状态机一样运行。

What you are looking for is far better suited to an off the cuff parser. 您正在寻找的东西更适合于袖口解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM