简体   繁体   English

正则表达式检查匹配和拆分字符串

[英]Regex to check for match and split string

Given a string in the following format: 给定以下格式的字符串:

xxx (aaa - bbb - CC-dd - ee-FFF)

I need to write a regex that returns a match if there are more than 3 " - " strings inside the parenthesis. 如果括号内有超过3个“ - ”字符串,我需要编写一个返回匹配的正则表达式。

It also needs to split the string (by " - " - space, hyphen, space) and return each of those groups in a separate match. 它还需要拆分字符串(通过“ - ” - 空格,连字符,空格)并在单独的匹配中返回每个组。 So given the above string, I expect the following matches: 所以给定上面的字符串,我希望以下匹配:

  1. aaa AAA
  2. bbb BBB
  3. CC-dd CC-DD
  4. ee-FFF EE-FFF

I have the following regex... 我有以下正则表达式...

\((([\w]).*(.[-].*?){3,}([\w]))\)

but I'm struggling to split the string and return the matches I need. 但我正在努力拆分字符串并返回我需要的匹配项。

You may use a regex based on a tempered greedy token : 您可以使用基于调和贪婪令牌的正则表达式:

\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)

See the regex demo 请参阅正则表达式演示

Details 细节

  • \\( - a ( char \\( - a ( char
  • (?<o>(?:(?! - )[^()])+) - Group "o": any char other than ( and ) , 1 or more occurrences, not starting the space-space sequence (?<o>(?:(?! - )[^()])+) - 组“o”:除()之外的任何字符,1次或多次出现,不启动space-space序列
  • (?: - (?<o>(?:(?! - )[^()])+)){3,} - three or more occurrences of (?: - (?<o>(?:(?! - )[^()])+)){3,} - 三次或更多次出现
    • - - space - space - - 空间-空间
    • (?<o>(?:(?! - )[^()])+) - Group "o": any char other than ( and ) , 1 or more occurrences, not starting the space-space sequence (?<o>(?:(?! - )[^()])+) - 组“o”:除()之外的任何字符,1次或多次出现,不启动space-space序列
  • \\) - a ) char \\) - a ) char

Get all the Group "o" captures to extract the values. 获取所有Group“o”捕获以提取值。

C# demo : C#demo

var s = "xxx (aaa - bbb CC - dd - ee-FFF) (aaa2 - bbb2 CC2- dd2- ee2-FFF2)";
var pattern = @"\((?<o>(?:(?! - )[^()])+)(?: - (?<o>(?:(?! - )[^()])+)){3,}\)";
var ms = Regex.Matches(s, pattern);
foreach (Match m in ms) 
{
    Console.WriteLine($"Matched: {m.Value}");
    var res = m.Groups["o"].Captures.Cast<Capture>().Select(x => x.Value);
    Console.WriteLine(string.Join("; ", res));
}

Output: 输出:

Matched: (aaa - bbb CC - dd - ee-FFF)
aaa; bbb CC; dd; ee-FFF

This problem can be rephrased like this: 这个问题可以这样重写:

You need to split the text between parentheses using " - " as a delimiter, and determine if there are 4 or more text fragments. 您需要使用“ - ”作为分隔符在括号之间拆分文本,并确定是否有4个或更多文本片段。

How I would do this: 我该怎么做:

  1. Use a regexp to get the text, something like: \\(([^\\)]+)\\) 使用正则表达式来获取文本,例如: \\(([^\\)]+)\\)
  2. split the matched text using String.Split(" - ") 使用String.Split(“ - ”)拆分匹配的文本
  3. check that the number of elements in the returned array is > 3 检查返回数组中的元素数是否> 3

This looks more maintainable than a huge regular expression, and should be equivalent in terms of performance, if not faster. 这看起来比一个巨大的正则表达式更易于维护,并且在性能方面应该是等效的,如果不是更快的话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM