简体   繁体   中英

Cannot remove a set of chars in a string

I have a set of characters I want to remove from a string : "/\\[]:|<>+=;,?*'@

I'm trying with :

private const string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";

private string Clean(string stringToClean)
{
    return Regex.Replace(stringToClean, "[" + Regex.Escape(CHARS_TO_REPLACE) + "]", "");
}

However, the result is strictly identical to the input with something like "Foo, bar and other" .

What is wrong in my code ?

This looks like a lot to this question , but with a black list instead of a white list of chars, so I removed the not in ^ char.

您没有逃脱CHARS_TO_REPLACE的结束方括号

As already mentioned (but the answer has suddenly disappeared), Regex.Escape does not escape ] , so you need to tweak your code:

    return Regex.Replace(stringToClean, "[" + Regex.Escape(CHARS_TO_REPLACE)
          .Replace("]", @"\]") + "]", " ");

The problem is a misunderstanding of how Regex.Escape works. From MSDN:

Escapes a minimal set of characters (\\, *, +, ?, |, {, [, (,), ^, $,., #, and white space) by replacing them with their escape codes.

It works as expected, but you need to think of Regex.Escape as escaping metacharacters outside of a character class. When you use a character class, the things you want to escape inside are different. For example, inside a character class - should be escaped to be literal, otherwise it could act as a range of characters (eg, [AZ] ).

In your case, as others have mentioned, the ] was not escaped. For any character that holds a special meaning within the character class, you will need to handle them separately after calling Regex.Escape . This should do what you need:

string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";
string pattern = "[" + Regex.Escape(CHARS_TO_REPLACE).Replace("]", @"\]") + "]";

string input = "hi\" there\\ [i love regex];@";
string result = Regex.Replace(input, pattern, "");
Console.WriteLine(result);

Otherwise, you were ending up with ["/\\\\\\[]:\\|<>\\+=;,\\?\\*'@] , which doesn't have ] escaped, so it was really ["/\\\\\\[] as a character class, then :\\|<>\\+=;,\\?\\*'@] as the rest of the pattern, which wouldn't match unless your string matched exactly those remaining characters.

There are a number of characters within CHARS_TO_REPLACE which are special to Regex's and need to be escaped with a slash \\ .

This should work:

"/\[]:\|<>\+=;,\?\*'@

Why not just do:

 private static string Clean(string stringToClean)
    {

        string[] disallowedChars = new string[] {//YOUR CHARS HERE};

        for (int i = 0; i < disallowedChars.Length; i++)
        {
            stringToClean= stringToClean.Replace(disallowedChars[i],""); 
        }

        return stringToClean;
    }

Single-statement linq solution:

private const string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";

private string Clean(string stringToClean) {
    return CHARS_TO_REPLACE
        .Aggregate(stringToClean, (str, l) => str.Replace(""+l, ""));
}

For the sake of knowledge, here is a variant suited for very large strings (or even streams). No regex here, simply a loop over each chars with a stringbuilder for storing the result :

class Program
{
    private const string CHARS_TO_REPLACE = @"""/\[]:|<>+=;,?*'@";

    static void Main(string[] args)
    {
        var wc = new WebClient();
        var veryLargeString = wc.DownloadString("http://msdn.microsoft.com");

        using (var sr = new StringReader(veryLargeString))
        {
            var sb = new StringBuilder();

            int readVal;
            while ((readVal = sr.Read()) != -1)
            {
                var c = (char)readVal;
                if (!CHARS_TO_REPLACE.Contains(c))
                {
                    sb.Append(c);
                }
            }

            Console.WriteLine(sb.ToString());
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM