简体   繁体   English

正则表达式用于可选组

[英]regex for optional group

I'd like to parse the following sample string 我想解析以下示例字符串

foo :6

into two groups: Text and Number. 分为两组:文本和数字。 The number group should be populated only if the character ":" precedes the number itself. 仅当字符“:”位于数字本身之前,才应填充数字组。

so: 所以:

foo 6 -> Text = "foo 6"
foo :6 -> Text = "foo", Number = "6"

The best I could come up with so far is 到目前为止,我能想到的最好的方法是

(?<Text>.+)(?=:(?<Number>\d+)h?)?

but that doesn't work because the first group greedily expands to the whole string. 但这不起作用,因为第一组贪婪地扩展到整个字符串。

Any suggestions? 有什么建议么?

If you really want to use a regex you can write quite a simple one, without lookarounds: 如果您真的想使用正则表达式,则可以编写一个非常简单的正则表达式,而无需查找:

(?<Text>[^:]+):?(?<Number>\d*)

In my opinion, regexes should be as simple as possible; 我认为,正则表达式应尽可能简单; if you do not want spaces around the Text group I suggest you use match.Groups["Text"].Value.Strip() . 如果不想在Text组周围使用空格,建议您使用match.Groups["Text"].Value.Strip()

Note that if you are parsing a multiline string this pattern will not work because, as @OscarHermosilla mentioned below, [?:]+ will also match newlines. 请注意,如果您要解析多行字符串,则此模式将不起作用,因为如下所述@OscarHermosilla, [?:]+ ?: [?:]+也将匹配换行符。 The fix is simple though, change it with [^:\\n] 修复很简单,请使用[^:\\n]更改

You don't need any seperate function for stripping the trailing whitespaces 您不需要任何单独的功能即可剥离尾随空白

The below regex would capture all the characters into the named group Text except :\\d+ (ie; : followed by one or more numbers). 下面的正则表达式会将所有字符捕获到命名组Text:\\d+除外(即:后接一个或多个数字)。 If it finds a colon followed by numbers, then it starts capturing the number into the named group Number 如果找到冒号后跟数字,则开始将数字捕获到命名组Number

^(?<Text>(?:(?!:\d+).)+(?=$|\s+:(?<Number>\d+)$))

DEMO 演示

String input = "foo 6";
String input1 = "foo :6";
Regex rgx = new Regex(@"^(?<Text>(?:(?!:\d+).)+(?=$|\s+:(?<Number>\d+)$))");

foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups["Text"].Value);
}
foreach (Match m in rgx.Matches(input1))
{
Console.WriteLine(m.Groups["Text"].Value);
Console.WriteLine(m.Groups["Number"].Value);
}

Output: 输出:

foo 6
foo
6

IDEONE 爱迪生

You can repeat the group name text with an alternation. 您可以交替输入组名文本。 This way: 这条路:

(?<Text>.+)\s+:(?<Number>\d)|(?<Text>.+)

DEMO 演示

Based on the idea behind this post: Regex Pattern to Match, Excluding when... / Except between 基于本文的想法: 要匹配的正则表达式模式,不包括... /

您可以简单地使用split而不是regex:

"foo :6".Split(':');

You can try like: 您可以尝试像:

(\D+)(?:\:(\d+))

or do a Regex.Split using this pattern: 或使用以下模式执行Regex.Split

(\s*\:\s*)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM