简体   繁体   中英

regular expression “.*[^a-zA-Z0-9_].*”

As I am trying to read more about regular expressions in C#, I just want to make sure of my conclusion that I made. for the following expression ".*[^a-zA-Z0-9_].* ", the " .* " at the beginning and end are useless, is that right ? because as I understood, that ".*" means zero or more occurrence of any character, but being followed by "[^a-zA-Z0-9_]" which means any character other than any combination of letters and digits case insensitive, makes ".*" useless to be added before and after "[^a-zA-Z0-9_]", is that right ?

Here is the code I am using to check if the expressions matches

// Here we call Regex.Match.
Match match = Regex.Match("anytest#", ".*[^a-z A-Z0-9_].*");
//Match match = Regex.Match("anytest#", "[^a-z A-Z0-9_]");

// Here we check the Match instance.
if (match.Success)
    Console.WriteLine("error");
else
    Console.WriteLine("no error");

The only difference would be whether the "margin characters" will be included in the result or not.

For:

ab41--_71j

It will match:

1--_7

And without the .* at beginning and end it will match:

--_

Any string will match the .*[^a-zA-Z0-9_].* regex at least once as long as it has at least one character that isn't a-zA-Z0-9_

From your currently last comment in your answer, I understand that you actually use:

^[a-zA-Z0-9]*$

This will match only if all characters are digit/letters. If it doesn't match, then the string is invalid.

If you also want to allow the _ character, then use:

^[a-zA-Z0-9_]*$

Which can even be shortened to:

^\\w$

In general, it is better to make regex's Validate rather than Invalidate strings. It just makes more sense and is more intuitive.

So my validation would look like:

if (Regex.IsMatch("anytest#", "^\\w$"))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

Another option that is probably faster:

if ("anytest#".ToCharArray().All(c => char.IsLetterOrDigit(c) || c == '_'))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

And if you don't want '_' to be included, it can even look nicer;

if ("anytest#".ToCharArray().All(char.IsLetterOrDigit))
{
    Console.WriteLine("Success");
}
else
{
    Console.WriteLine("Error");
}

No, because there are other characters than aZ and 0-9 .

That regex matches all strings that start with any characters followed not by a-zA-Z0-9 and end with any characters. Or just a string that does not contain a-zA-Z0-9 at all.

If you leave the .* then you just have a regex that matches a charatcer that does not contain a-zA-Z0-9 at all.

.*[^a-zA-Z0-9_].*  matches for instance: ABC_ß_ABC
[^a-zA-Z0-9_]      matches for instance: ß   (and this regex just matches 1 character)

.*[^a-zA-Z0-9_].* will match the entire input as long as there is a non-alphanumeric/underscore somewhere in the input. [^a-zA-Z0-9_] will match only a single non-alphanumeric/underscore character (most likely the last one, if you're using the default greedy matching) if it is somewhere in the input. Which one you want depends on the input and what you want to do once you find out if a non-alphanumeric/underscore character exists in the input.

Input 1 : ABC_ß_ABC

Input 2 : ß

Regex 1: .*[^a-zA-Z0-9_].* Regex 2: [^a-zA-Z0-9_]

Both the inputs match both the regex,

For input 1

Regex 1 matches 9 characters

Regex 2 matches only 1 character

Only include those tokens in the Regex that you are actually looking for. In your case you didn't actually care whether there are any other characters before or after the excluding character class you specified. Adding .* before and after that doesn't change the success of the match, but makes matching more complicated. A Regex matches anywhere already, unless you specifically anchor it somehow, eg using ^ at the start.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM