简体   繁体   中英

Regex: Match any punctuation character except . and _

Is there an easy way to match all punctuation except period and underscore, in a C# regex? Hoping to do it without enumerating every single punctuation mark.

Use Regex Subtraction

[\p{P}-[._]]

Here's the link for .NET Regex documentation (I'm not sure if other flavors support it)... http://msdn.microsoft.com/en-us/library/ms994330.aspx

Here's a C# example

string pattern = @"[\p{P}\p{S}-[._]]"; // added \p{S} to get ^,~ and ` (among others)
string test = @"_""'a:;%^&*~`bc!@#.,?";
MatchCollection mx = Regex.Matches(test, pattern);
foreach (Match m in mx)
{
    Console.WriteLine("{0}: {1} {2}", m.Value, m.Index, m.Length);
}

Explanation The pattern is a Character Class Subtraction. It starts with a standard character class like [\\p{P}] and then adds a Subtraction Character Class like -[._] which says to remove the . and _. The subtraction is placed inside the [ ] after the standard class guts.

The answers so far do not respect ALL punctuation. This should work:

(?![\._])\p{P}

(Explanation: Negative lookahead to ensure that neither . nor _ are matched, then match any unicode punctuation character.)

Here is something a little simpler. Not words or white-space (where words include A-Za-z0-9 AND underscore).

[^\w\s.]

You could possibly use a negated character class like this:

[^0-9A-Za-z._\s]

This includes every character except those listed. You may need to exclude more characters (such as control characters), depending on your ultimate requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM