简体   繁体   中英

Regex to replace invalid characters

I don't have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?

string messyText = GetText();
string cleanText = messyText.Trim()
         .ToUpper()
         .Replace(",", "")
         .Replace(":", "")
         .Replace(".", "")
         .Replace(";", "")
         .Replace("/", "")
         .Replace("\\", "")
         .Replace("\n", "")
         .Replace("\t", "")
         .Replace("\r", "")
         .Replace(Environment.NewLine, "")
         .Replace(" ", "");

Thanks

Try this regex:

Regex regex = new Regex(@"[\s,:.;/\\]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

\\s is a character class equivalent to [ \\t\\r\\n] .


If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

Where \\W is any non-word character (not [^a-zA-Z0-9_] ).

Character classes to the rescue!

string messyText = GetText();
string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "")

You would probably want to use a whitelist approach, there is an ocean of funny characters whose effect depending on combination may not be easy to figure.

A simple regex that removes everything but the allowed characters could look like this:

messyText = Regex.Replace(messyText, @"[^a-zA-Z0-9\x7C\x2C\x2E_]", "");

The ^ is there to invert the selection, apart from the alphanumeric characters this regex allows | , . and _ You can add and remove characters and character sets as needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM