简体   繁体   中英

Regex to check for match and split string

Given a string in the following format:

xxx (aaa - bbb - CC-dd - ee-FFF)

I need to write a regex that returns a match if there are more than 3 " - " strings inside the parenthesis.

It also needs to split the string (by " - " - space, hyphen, space) and return each of those groups in a separate match. So given the above string, I expect the following matches:

  1. aaa
  2. bbb
  3. CC-dd
  4. ee-FFF

I have the following regex...

\((([\w]).*(.[-].*?){3,}([\w]))\)

but I'm struggling to split the string and return the matches I need.

You may use a regex based on a tempered greedy token :

\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)

See the regex demo

Details

  • \\( - a ( char
  • (?<o>(?:(?! - )[^()])+) - Group "o": any char other than ( and ) , 1 or more occurrences, not starting the space-space sequence
  • (?: - (?<o>(?:(?! - )[^()])+)){3,} - three or more occurrences of
    • - - space - space
    • (?<o>(?:(?! - )[^()])+) - Group "o": any char other than ( and ) , 1 or more occurrences, not starting the space-space sequence
  • \\) - a ) char

Get all the Group "o" captures to extract the values.

C# demo :

var s = "xxx (aaa - bbb CC - dd - ee-FFF) (aaa2 - bbb2 CC2- dd2- ee2-FFF2)";
var pattern = @"\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)";
var ms = Regex.Matches(s, pattern);
foreach (Match m in ms) 
{
    Console.WriteLine($"Matched: {m.Value}");
    var res = m.Groups["o"].Captures.Cast<Capture>().Select(x => x.Value);
    Console.WriteLine(string.Join("; ", res));
}

Output:

Matched: (aaa - bbb CC - dd - ee-FFF)
aaa; bbb CC; dd; ee-FFF

This problem can be rephrased like this:

You need to split the text between parentheses using " - " as a delimiter, and determine if there are 4 or more text fragments.

How I would do this:

  1. Use a regexp to get the text, something like: \\(([^\\)]+)\\)
  2. split the matched text using String.Split(" - ")
  3. check that the number of elements in the returned array is > 3

This looks more maintainable than a huge regular expression, and should be equivalent in terms of performance, if not faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM