简体   繁体   English

c#从XML标记中删除(空)

[英]c# remove (null) from XML tags

I need to figure out a good way using C# to parse an XML file for (NULL) and remove it from the tags and replace it with the word BAD . 我需要找出一种使用C#解析(NULL)的XML文件并将其从标记中删除并将其替换为BAD

For example: 例如:

<GC5_(NULL) DIRTY="False"></GC5_(NULL)>

should be replaced with 应该替换为

<GC5_BAD DIRTY="False"></GC5_BAD>

Part of the problem is I have no control over the original XML, I just need to fix it once I receive it. 问题的一部分是我无法控制原始XML,我只需要在收到原始XML后对其进行修复。 The second problem is that the (NULL) can appear in zero, one, or many tags. 第二个问题是(NULL)可以出现在零个,一个或多个标签中。 It appears to be an issue with users filling in additional fields or not. 用户是否填写其他字段似乎是一个问题。 So I might get 所以我可能会得到

<GC5_(NULL) DIRTY="False"></GC5_(NULL)>

or 要么

<MH_OTHSECTION_TXT_(NULL) DIRTY="False"></MH_OTHSECTION_TXT_(NULL)>

or 要么

<LCDATA_(NULL) DIRTY="False"></LCDATA_(NULL)>

I am a newbie to C# and programming. 我是C#和编程的新手。

EDIT: So I have come up with the following function that while not pretty, so far work. 编辑:所以我想出了以下功能,虽然还不很漂亮,但到目前为止仍能正常工作。

public static string CleanInvalidXmlChars(string fileText)
    {
        List<char> charsToSubstitute = new List<char>();
        charsToSubstitute.Add((char)0x19);
        charsToSubstitute.Add((char)0x1C);
        charsToSubstitute.Add((char)0x1D);
        foreach (char c in charsToSubstitute)
            fileText = fileText.Replace(Convert.ToString(c), string.Empty);

        StringBuilder b = new StringBuilder(fileText);
        b.Replace("&#x0;", string.Empty);
        b.Replace("&#x1C;", string.Empty);
        b.Replace("<(null)", "<BAD");
        b.Replace("(null)>", "BAD>");

        Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
        String result = nullMatch.Replace(b.ToString(), "<$1_BAD$2>");

        result = result.Replace("(NULL)", "BAD");

        return result;
    }

I have only been able to find 6 or 7 bad XML files to test this code on, but it has worked on each of them and not removed good data. 我只能找到6或7个错误的XML文件来测试此代码,但是它对每个文件都起作用,并且没有删除好的数据。 I appreciate the feedback and your time. 感谢您的反馈和宝贵的时间。

In general, regular expressions are not the right way of handling XML files. 通常,正则表达式不是处理XML文件的正确方法。 There's a range of solutions to handle XML files correctly - you can read up on System.Xml.Linq for a good start. 有一系列解决方案可以正确处理XML文件-您可以在System.Xml.Linq上阅读以获取良好的开始。 If you're a newbie, it's certainly something you should learn at some point. 如果您是新手,那肯定是您应该在某个时候学到的东西。 As Ed Plunkett pointed out in the comments, though, your XML is not actually XML: ( and ) characters are not allowed in XML element names. 但是,正如Ed Plunkett在评论中指出的那样,您的XML实际上不是XML:XML元素名称中不允许使用()字符。

Since you will have to do it as an operation on a string, Corak's comment to use 由于您必须将其作为对字符串的操作来完成,因此要使用Corak的注释

contentOfXml.Replace("(NULL)", "BAD");

may be a good idea, but will break if any elements can contain the string (NULL) as anything other than their name. 可能是个好主意,但是如果任何元素可以包含字符串(NULL)作为其名称以外的其他名称,它就会中断。

If you want a regex approach, this might work decently, but I'm not sure if it's not missing any edge cases: 如果您想使用正则表达式,则可能效果不错,但是我不确定它是否没有遗漏任何边缘情况:

var regex = new Regex(@"(<\/?[^_]*_)\(NULL\)([^>]*>)");
var result = regex.Replace(contentOfXml, "$1BAD$2");

Will it be suitable for you to read this XML as a string and perform a regex replacement? 将这个XML读取为字符串并执行正则表达式替换是否合适? Like: 喜欢:

Regex nullMatch = new Regex("<(.+?)_\\(NULL\\)(.+?)>");
String processedXmlString = nullMatch.Replace(originalXmlString, "<$1_BAD$2>");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM