简体   繁体   中英

Regex.Matches is skipping over a match? c#

I need to identify substrings found in a string such as:

"CityABCProcess Test" or "cityABCProcess Test"

to yield :

[ "City/city", "ABC", "Process", "Test" ]

  1. The first string in the substring can be lowercase or uppercase
  2. Any substring with recurring uppercase letters will be a substring until a lowercase letter or space is found "ABCProcess -> ABC, ABC Process -> ABC"
  3. If there is an uppercase letter followed by a lowercase letter the substring will be everything until the next uppercase letter.

The regular expression we have been using is:

"[AZ][az]+|([AZ]|[0-9])+\\b|[AZ]+(?=[AZ])|([az]|[0-9])+"

This has been working great but breaks in the case of a string:

"X-999"

We are implementing it in this fashion:

        StringBuilder builder = new StringBuilder();
        builder.Append("[A-Z][a-z]+|([A-Z]|[0-9])+\b|[A-Z]+(?=[A-Z])|([a-z]|[0-9])+");

        foreach (Match match in Regex.Matches(name, builder.ToString()))
        {
            //do things with each match
        }

The problem here is it is not matching on the 'X' but only the '999'. Any ideas? I tested it with regexr.com and it says this regex should match on both substrings.

\\b is being interpreted as an escape sequence (\, backspace) in the C# string.

Escape the slash (ie, \\\\b ), or use a verbatim string using the @ symbol:

        builder.Append(@"[A-Z][a-z]+|([A-Z]|[0-9])+\b|[A-Z]+(?=[A-Z])|([a-z]|[0-9])+");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM