简体   繁体   中英

Regex split by same character within brackets

I have a like long string, like so:

(A) name1, name2, name3, name3 (B) name4, name5, name7 (via name7) ..... (AA) name47, name47 (via name 46) (BB) name48, name49

Currently I split by "(" but it picks up the via as new lines)

string[] lines = routesRaw.Split(new[] { "  (" }, StringSplitOptions.RemoveEmptyEntries);

How can I split the information within the first brackets only? There is no AB, AC, AD, etc. the characters are always the same within the brackets.

Thanks.

You may use a matching approach here since the pattern you need will contain a capturing group in order to be able to match the same char 0 or more amount of times, and Regex.Split outputs all captured substrings together with non-matches.

I suggest

(?s)(.*?)(?:\(([A-Z])\2*\)|\z)

Grab all non-empty Group 1 values. See the regex demo .

Details

  • (?s) - a dotall, RegexOptions.Singleline option that makes . match newlines, too
  • (.*?) - Group 1: any 0 or more chars, but as few as possible
  • (?:\\(([AZ])\\2*\\)|\\z) - a non-capturing group that matches:
    • \\(([AZ])\\2*\\) - ( , then Group 2 capturing any uppercase ASCII letter, then any 0 or more repetitions of this captured letter and then )
    • | - or
    • \\z - the very end of the string.

In C#, use

var results = Regex.Matches(text, @"(?s)(.*?)(?:\(([A-Z])\2*\)|\z)")
        .Cast<Match>()
        .Select(x => x.Groups[1].Value)
        .Where(z => !string.IsNullOrEmpty(z))
        .ToList();

See the C# demo online .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM