简体   繁体   中英

C# Regex doesn't correctly match the input string

I'm working on an ASP.NET form application that takes in a master course ID from user input and matches it against a format. The format looks like this:

HIST-1302-233IN-FA2012

or it could be

XL-HIST-1302-233IN-FA2012

Here is my regex:

string masterCourseRegex = @"(.{4}-.{4}-.{5}-.{6})/|XL-(.{4}-.{4}-.{5}-.{6})";

I've tested this in Rubular without the forward escape before the XL and it seems to work for both formats. But in my testing of my web app, the code seems to think that HIST-1302-233IN-FA2012 doesn't match and so it follows the path of the code indicating that the course ID didn't match the specified format thus throwing the message of "invalid course ID format" when it ought to be matching just fine and going onto code that will actually use it.

My form correctly recognizes when something has the XL- in front of it and continues to process as usual, I just have an issue with the standard format without the XL. Here is my code:

if (!Regex.IsMatch(txtBoxMasterCourse.Text, masterCourseRegex))
                {
                    string msg = string.Empty;
                    StringBuilder sb = new StringBuilder();
                    sb.Append("alert('The course ID " + txtBoxMasterCourse.Text + " did not match the naming standards for Blackboard course IDs. Please be sure to use the correct naming convention as specified on the form in the example.");
                    sb.Append(msg.Replace("\n", "\\n").Replace("\r", "").Replace("'", "\\'"));
                    sb.Append("');");
                    ScriptManager.RegisterStartupScript(this.Page, this.GetType(), "showalert", sb.ToString(), true);
                }

I can't see anything wrong that is readily apparent to me and would appreciate your input.

Thanks!

If we break down your expression and add some comments it is easier to see the problem.

string masterCourseRegex = @"
   (    # Capture
    .{4}  # Match any character, exactly four times
    -     # Match a single hyphen/minus
    .{4}  # Match any character, exactly four times
    -     # Match a single hyphen/minus
    .{5}  # Match any character, exacly five times.
    -     # Match a single hyphen/minus
    .{6}  # Match any character, exactly six times
   )    # End Capture
   /    # Match a single forward slash <----------- HERE IS THE PROBLEM
   |    # OR
   XL   # Match the characters XL
   -    # Match a single forward slash
   (
   .{4}   # Match any character, exactly four times
   -      # Match a single hyphen/minus
   .{4}   # Match any character, exactly four times
   -      # Match a single hyphen/minus
   .{5}   # Match any character, exactly five times
   -      # Match a single hyphen/minus
   .{6}   # Match any character, exactly six times
   )"

Removing the forward slash from your original expression will allow it to match both of your examples.

string masterCourseRegex = @"(.{4}-.{4}-.{5}-.{6})|XL-(.{4}-.{4}-.{5}-.{6})";

Alternatively, you may want to consider making the expression more specific by eliminating the use of the . matches. For example:

string masterCourseRegex = @"(XL-)?(\w{4}-\d{4}-[\w\d]{5}-[\w\d]{6})";

Which also works against your given examples of "HIST-1302-233IN-FA2012" and "XL-HIST-1302-233IN-FA2012" .

It's generally a good practice to be as specific as possible in a regular expression. Remember that the . operator matches any character, and it's use can make debugging a regular expression more difficult than it needs to.

Don't get all fancy. Try something like:

static Regex rx = new Regex( @"
  ^                     # start-of-text
  (XL-)?                # followed by an optional "XL-" prefix
  [A-Z][A-Z][A-Z][A-Z]  # followed by 4 letters
  -                     # followed by a literal hyphen ("-")
  \d\d\d\d              # followed by 4 decimal digits
  -                     # followed by a literal hyphen ("-")
  \d\d\d[A-Z][A-Z]      # followed by 3 decimal digits and 2 letters ("###XX")
  -                     # followed by a literal hyphen
  [A-Z][A-Z]\d\d\d\d    # followed by 2 letters and 4 decimal digits ("NN####")
  $                     # followed by end-of-text
  " , RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase
  ) ;

You should also anchor your match to start/end of text (unless you're willing to accept a match other than the entire string.)

试试这个:

string masterCourseRegex = @"(XL-)?(\w{4}-\w{4}-\w{5}-\w{6})";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM