I don't have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?
string messyText = GetText();
string cleanText = messyText.Trim()
.ToUpper()
.Replace(",", "")
.Replace(":", "")
.Replace(".", "")
.Replace(";", "")
.Replace("/", "")
.Replace("\\", "")
.Replace("\n", "")
.Replace("\t", "")
.Replace("\r", "")
.Replace(Environment.NewLine, "")
.Replace(" ", "");
Thanks
Try this regex:
Regex regex = new Regex(@"[\s,:.;/\\]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
\\s
is a character class equivalent to [ \\t\\r\\n]
.
If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:
Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();
Where \\W
is any non-word character (not [^a-zA-Z0-9_]
).
Character classes to the rescue!
string messyText = GetText();
string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "")
You would probably want to use a whitelist approach, there is an ocean of funny characters whose effect depending on combination may not be easy to figure.
A simple regex that removes everything but the allowed characters could look like this:
messyText = Regex.Replace(messyText, @"[^a-zA-Z0-9\x7C\x2C\x2E_]", "");
The ^ is there to invert the selection, apart from the alphanumeric characters this regex allows | , . and _ You can add and remove characters and character sets as needed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.