简体   繁体   English

使用C#Regular表达式替换XML元素内容

[英]Using C# Regular expression to replace XML element content

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. 我正在编写一些处理日志记录xml数据的代码,我希望能够替换文档中某些元素(例如密码)的内容。 I'd rather not serialize and parse the document as my code will be handling a variety of schemas. 我宁愿不序列化和解析文档,因为我的代码将处理各种模式。

Sample input documents: 样本输入文件:

doc #1: doc#1:

   <user>
       <userid>jsmith</userid>
       <password>myPword</password>
    </user>

doc #2: doc#2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>myPword</ns:password>
 </secinfo>

What I'd like my output to be: 我希望我的输出是:

output doc #1: 输出文档#1:

<user>
       <userid>jsmith</userid>
       <password>XXXXX</password>
 </user>

output doc #2: 输出文档#2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>XXXXX</ns:password>
 </secinfo>

Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly. 由于我将要处理的文档可能有各种模式,我希望能够找到一个很好的通用正则表达式解决方案,它可以找到带有密码的元素并相应地屏蔽内容。

Can I solve this using regular expressions and C# or is there a more efficient way? 我可以使用正则表达式和C#解决这个问题,还是有更有效的方法?

This problem is best solved with XSLT: 使用XSLT可以最好地解决此问题:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="//password">
        <xsl:copy>
            <xsl:text>XXXXX</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This will work for both inputs as long as you handle the namespaces properly. 只要正确处理命名空间,这将适用于两个输入。

Edit : Clarification of what I mean by "handle namespaces properly" 编辑:通过“正确处理命名空间”澄清我的意思

Make sure your source document that has the ns name prefix has as namespace defined for the document like so: 确保具有ns名称前缀的源文档具有为文档定义的命名空间,如下所示:

<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
    <ns:username>jsmith</ns:username>
    <ns:password>XXXXX</ns:password>
</secinfo>

I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. 我会说你最好用.NET XmlDocument对象解析内容并使用XPath查找密码元素,然后更改其innerXML属性。 It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand. 它具有更正确的优点(因为XML首先不是常规的),并且它在概念上很容易理解。

From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT . 根据尝试解析和/或修改XML而没有适当解析器的系统的经验,让我说: 不要做 Use an XML parser (There are other answers here that have ways to do that quickly and easily). 使用XML解析器(此处还有其他答案可以快速轻松地完成此操作)。

Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. 使用非xml方法来解析和/或修改XML流将始终会让您在将来的某个时刻感到痛苦。 I know, because I have felt that pain. 我知道,因为我感到痛苦。

I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. 我知道,如果你使用正则表达式解决方案,它似乎会更快 - 运行时/更简单 - 代码/更容易理解/无论如何。 But you're just going to make someone's life miserable later. 但是你以后会让某人的生活变得悲惨。

You can use regular expressions if you know enough about what you are trying to match. 如果您对自己要匹配的内容有足够的了解,则可以使用正则表达式。 For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work: 例如,如果您正在查找其中没有内部标记的任何带有“密码”字样的标签,则此正则表达式将起作用:

(<([^>]*?password[^>]*?)>)([^<]*?)(<\/\2>)

You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead. 您也可以在zowat的答案中使用相同的C#替换语句,但对于替换字符串,您可能希望使用“$ 1XXXXX $ 4”。

The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. XSLT存在的主要原因是能够转换XML结构,这意味着XSLT是一种样式表,可用于改变元素的顺序和更改元素的内容。 Therefore this is a typical situation where it´s highly recommended to use XSLT instead of parsing as Andrew Hare said in a previous post. 因此,这是一个典型的情况,强烈建议使用XSLT而不是解析,就像Andrew Hare在之前的帖子中所说的那样。

Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it. 正则表达式对此是错误的方法,我已经看到它在你最不期望它时会出现如此严重的错误。

XDocument is way more fun anyway: 无论如何,XDocument更有趣:

XDocument doc = XDocument.Parse(@"
            <user>
                <userid>jsmith</userid>
                <password>password</password>
            </user>");

doc.Element("user").Element("password").Value = "XXXX";

// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(@"
            <secinfo xmlns:ns='http://tempuru.org/users'>
                <ns:userid>jsmith</ns:userid>
                <ns:password>password</ns:password>
            </secinfo>");

doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";

Here is what I came up with when I went with XMLDocument, it may not be as slick as XSLT, but should be generic enough to handle a variety of documents: 这是我在使用XMLDocument时提出的,它可能不像XSLT那样灵活,但应该足够通用以处理各种文档:

            //input is a String with some valid XML
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(input);
            XmlNodeList nodeList = doc.SelectNodes("//*");

            foreach (XmlNode node in nodeList)
            {
                if (node.Name.ToUpper().Contains("PASSWORD"))
                {
                    node.InnerText = "XXXX";
                }
                else if (node.Attributes.Count > 0)
                {
                    foreach (XmlAttribute a in node.Attributes)
                    {
                        if (a.LocalName.ToUpper().Contains("PASSWORD"))
                        {
                            a.InnerText = "XXXXX";
                        }
                    }
                }    
            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM