简体   繁体   English

具有可选匹配组的正则表达式

[英]Regex with optional matching groups

I'm trying to parse given string which is kind a of path separated with / . 我正在尝试解析给定的字符串,这是一个用/分隔的路径。 I need to write regex that would match each segment in the path to corresponding regex group. 我需要编写正则表达式,以匹配相应正则表达式组的路径中的每个段。

Example 1: 例1:

input: 输入:

/EAN/SomeBrand/appliances/refrigerators/RF444

output: 输出:

Group: producer, Value: SomeBrand Group: category, Value: appliances Group: subcategory, Value: refrigerators Group: product, Value: RF4441

Example 2: 例2:

input: 输入:

/EAN/SomeBrand/appliances

output: 输出:

Group: producer, Value: SomeBrand Group: category, Value: appliances Group: subcategory, Value: Group: product, Value:

I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2). 我尝试了下面的代码,当路径已满时(例如在第一个例子中)它工作正常但在输入字符串是公正的时候找不到组(如例2)。

static void Main()
{
  var pattern = @"^" + @"/EAN"
                + @"/" + @"(?<producer>.+)"
                + @"/" + @"(?<category>.+)"
                + @"/" + @"(?<subcategory>.+)"
                + @"/" + @"(?<product>.+)?"
                + @"$";

  var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
  var result = rgx.Match(@"/EAN/SomeBrand/appliances/refrigerators/RF444");

  foreach (string groupName in rgx.GetGroupNames())
  {
    Console.WriteLine(
       "Group: {0}, Value: {1}",
       groupName,
       result.Groups[groupName].Value);
  }


  Console.ReadLine();
}

Any suggestion is welcome. 任何建议都是受欢迎的。 Unfortunately I cannot simply split the string since the framework I'm using expects regex object. 不幸的是我不能简单地拆分字符串,因为我正在使用的框架需要正则表达式对象。

You can use optional groups (...)? 你可以使用可选组(...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+ : 并将.+贪婪点匹配模式替换为否定字符类[^/]+

^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
                                           ^                      ^^^                  ^^

See the regex demo 请参阅正则表达式演示

This is how you need to declare your regex in the C# code: 这就是你需要在C#代码中声明你的正则表达式的方法:

var pattern = @"^" + @"/EAN"
            + @"/(?<producer>[^/]+)"
            + @"/(?<category>[^/]+)"
            + @"(/(?<subcategory>[^/]+))?"
            + @"(/(?<product>[^/]+))?"
            + @"$";

var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups . 注意我使用常规捕获组作为可选组,但RegexOptions.ExplicitCapture标志将所有非命名捕获组转换为非捕获组,因此它们不会出现在Match.Groups So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)? 因此,即使不使用非捕获可选组(?:...)? ,我们也始终只有5个组(?:...)? .

Try 尝试

var pattern = @"^" + @"/EAN"
    + @"(?:/" + @"(?<producer>[^/]+))?"
    + @"(?:/" + @"(?<category>[^/]+))?"
    + @"(?:/" + @"(?<subcategory>[^/]+))?"
    + @"(?:/" + @"(?<product>[^/]+))?";

Note how I replaced the . 请注意我是如何取代的. with [^/] , because you want to use the / to split strings. 使用[^/] ,因为你想使用/来分割字符串。 Note even the use of the optional quantifier for each sub-part ( ? ) 注意甚至为每个子部分使用可选的量词( ?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM