[英]Regex to check for match and split string
Given a string in the following format: 给定以下格式的字符串:
xxx (aaa - bbb - CC-dd - ee-FFF)
I need to write a regex that returns a match if there are more than 3 " - " strings inside the parenthesis. 如果括号内有超过3个“ - ”字符串,我需要编写一个返回匹配的正则表达式。
It also needs to split the string (by " - " - space, hyphen, space) and return each of those groups in a separate match. 它还需要拆分字符串(通过“ - ” - 空格,连字符,空格)并在单独的匹配中返回每个组。 So given the above string, I expect the following matches:
所以给定上面的字符串,我希望以下匹配:
I have the following regex... 我有以下正则表达式...
\((([\w]).*(.[-].*?){3,}([\w]))\)
but I'm struggling to split the string and return the matches I need. 但我正在努力拆分字符串并返回我需要的匹配项。
You may use a regex based on a tempered greedy token : 您可以使用基于调和贪婪令牌的正则表达式:
\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)
See the regex demo 请参阅正则表达式演示
Details 细节
\\(
- a (
char \\(
- a (
char (?<o>(?:(?! - )[^()])+)
- Group "o": any char other than (
and )
, 1 or more occurrences, not starting the space-space
sequence (?<o>(?:(?! - )[^()])+)
- 组“o”:除(
和)
之外的任何字符,1次或多次出现,不启动space-space
序列 (?: - (?<o>(?:(?! - )[^()])+)){3,}
- three or more occurrences of (?: - (?<o>(?:(?! - )[^()])+)){3,}
- 三次或更多次出现
-
- space -
space -
- 空间-
空间 (?<o>(?:(?! - )[^()])+)
- Group "o": any char other than (
and )
, 1 or more occurrences, not starting the space-space
sequence (?<o>(?:(?! - )[^()])+)
- 组“o”:除(
和)
之外的任何字符,1次或多次出现,不启动space-space
序列 \\)
- a )
char \\)
- a )
char Get all the Group "o" captures to extract the values. 获取所有Group“o”捕获以提取值。
var s = "xxx (aaa - bbb CC - dd - ee-FFF) (aaa2 - bbb2 CC2- dd2- ee2-FFF2)";
var pattern = @"\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)";
var ms = Regex.Matches(s, pattern);
foreach (Match m in ms)
{
Console.WriteLine($"Matched: {m.Value}");
var res = m.Groups["o"].Captures.Cast<Capture>().Select(x => x.Value);
Console.WriteLine(string.Join("; ", res));
}
Output: 输出:
Matched: (aaa - bbb CC - dd - ee-FFF)
aaa; bbb CC; dd; ee-FFF
This problem can be rephrased like this: 这个问题可以这样重写:
You need to split the text between parentheses using " - " as a delimiter, and determine if there are 4 or more text fragments. 您需要使用“ - ”作为分隔符在括号之间拆分文本,并确定是否有4个或更多文本片段。
How I would do this: 我该怎么做:
\\(([^\\)]+)\\)
\\(([^\\)]+)\\)
This looks more maintainable than a huge regular expression, and should be equivalent in terms of performance, if not faster. 这看起来比一个巨大的正则表达式更易于维护,并且在性能方面应该是等效的,如果不是更快的话。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.