简体   繁体   English

使用Regex清除字符串与Base64编码的字符串

[英]Using a Regex to clean string versus Base64 Encoded string

I have a extension method that is using a Regex.Replace to clean up invalid characters in an user-entered string before it is added to a XML document. 我有一个扩展方法,该方法使用Regex.Replace在将用户输入的字符串中的无效字符添加到XML文档之前将其清除。

The intent of the regex is to strip out some random hi-ASCII characters that are occasionally in the input when the user pastes text from Microsoft Word and replace them with a space: 正则表达式的目的是去除用户从Microsoft Word粘贴文本并将其替换为空格时在输入中偶尔出现的一些随机的hi-ASCII字符:

    public static string CleanInput(this string inputString) {
        if (string.IsNullOrEmpty(inputString))
            return string.Empty;

        // Replace invalid characters with a space.
        return Regex.Replace(inputString, @"[^\w\.@-]", " ");
    }

Now as fate would have it, someone is now using this extension method on a string that contains base64-encoded data. 现在,就像命运那样,有人正在对包含base64编码数据的字符串使用此扩展方法。

What I believe is that the regex will leave MOST of the base64 data unmodified, however I think it is might be changing some of it. 我相信的是,正则表达式将离开科技部的base64数据未经修改的,但是我认为这是可能会改变一些。

So - knowing that \\w in the regex is matching [A-Za-z0-9_] and that Base64 effectively the same range, should this regex be changing the string or not? 所以-明知\\w在正则表达式是匹配[A-Za-z0-9_]和Base64的有效范围相同,这应该是正则表达式改变字符串或不是?

If it is changing the string, why and how would you change it so that hi-ASCII garbage is still cleaned up in regular non-encoded text without mucking up the encoded string. 如果要更改字符串,为什么以及如何更改它,以便仍以常规的非编码文本清除hi-ASCII垃圾,而不会破坏编码的字符串。

Base64 also uses + , / , and = . Base64还使用+/=

You can add these to your character class: 您可以将这些添加到您的角色类中:

[^\w\.@+/=-]

Note that - has to be last in order for it to be a literal hyphen-minus instead of specifying a range. 请注意, -必须为最后,才能使其为文字连字符减号,而不是指定范围。

It may also be worth considering that \\w isn't necessarily the same as [A-Za-z0-9_] according to Microsoft . 根据Microsoft的说法\\w不一定与[A-Za-z0-9_]相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM