使用Regex清除字符串与Base64编码的字符串

Question

I have a extension method that is using a Regex.Replace to clean up invalid characters in an user-entered string before it is added to a XML document. 我有一个扩展方法，该方法使用Regex.Replace在将用户输入的字符串中的无效字符添加到XML文档之前将其清除。

The intent of the regex is to strip out some random hi-ASCII characters that are occasionally in the input when the user pastes text from Microsoft Word and replace them with a space: 正则表达式的目的是去除用户从Microsoft Word粘贴文本并将其替换为空格时在输入中偶尔出现的一些随机的hi-ASCII字符：

    public static string CleanInput(this string inputString) {
        if (string.IsNullOrEmpty(inputString))
            return string.Empty;

        // Replace invalid characters with a space.
        return Regex.Replace(inputString, @"[^\w\.@-]", " ");
    }

Now as fate would have it, someone is now using this extension method on a string that contains base64-encoded data. 现在，就像命运那样，有人正在对包含base64编码数据的字符串使用此扩展方法。

What I believe is that the regex will leave MOST of the base64 data unmodified, however I think it is might be changing some of it. 我相信的是，正则表达式将离开科技部的base64数据未经修改的，但是我认为这是可能会改变一些。

So - knowing that \\w in the regex is matching [A-Za-z0-9_] and that Base64 effectively the same range, should this regex be changing the string or not? 所以-明知\\w在正则表达式是匹配[A-Za-z0-9_]和Base64的有效范围相同，这应该是正则表达式改变字符串或不是？

If it is changing the string, why and how would you change it so that hi-ASCII garbage is still cleaned up in regular non-encoded text without mucking up the encoded string. 如果要更改字符串，为什么以及如何更改它，以便仍以常规的非编码文本清除hi-ASCII垃圾，而不会破坏编码的字符串。

Answer 1

Base64 also uses + , / , and = . Base64还使用+ ， /和= 。

You can add these to your character class: 您可以将这些添加到您的角色类中：

[^\w\.@+/=-]

Note that - has to be last in order for it to be a literal hyphen-minus instead of specifying a range. 请注意， -必须为最后，才能使其为文字连字符减号，而不是指定范围。

It may also be worth considering that \\w isn't necessarily the same as [A-Za-z0-9_] according to Microsoft . 根据Microsoft的说法， \\w不一定与[A-Za-z0-9_]相同。

使用Regex清除字符串与Base64编码的字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-10-25 18:52:45

使用Regex清除字符串与Base64编码的字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-10-25 18:52:45

解决方案1
1 已采纳 2012-10-25 18:52:45