简体   繁体   English

正则表达式匹配单行中的多个字段值

[英]Regex matching multiple field value in a single line

i wish to match a multiple field value delimited by a colon in a single line, but each field and value text contains space eg我希望在一行中匹配由冒号分隔的多个字段值,但每个字段和值文本都包含空格,例如

field1   :    value1a  value1b

answer
match1: Group1=field1, Group2=value1a value1b

or或者

field1   :    value1a  value1b   field2   : value2a value2b

answer
match1: Group1=field1, Group2=value1a value1b
match2: Group1=field2, Group2=value2a value2b

the best i can do right now is (\w+)\s*:\s*(\w+)我现在能做的最好的事情是 (\w+)\s*:\s*(\w+)

Regex regex = new Regex(@"(\w+)\s*:\s*(\w+)");
Match m = regex.Match("field1   :    value1a  value1b   field2   : value2a value2b");
while (m.Success)
{
   string f = m.Groups[1].Value.Trim();
   string v = m.Group2[2].Value.Trim();
}

i guess look ahead may help, but i don't know how to make it thank you我想向前看可能会有所帮助,但我不知道该怎么做谢谢

You may try你可以试试

(\w+)\s*:\s*((?:(?!\s*\w+\s*:).)*)
  • (\w+) group 1, any consecutive words (\w+)第 1 组,任何连续的单词
  • \s*:\s* a colon with any space around \s*:\s*一个冒号,周围有任何空格
  • (...) group 2 (...)第 2 组
  • (?:...)* a non capture group, repeats any times (?:...)*非捕获组,重复任何时间
  • (?:\s*\w+\s*.). negative lookahead with a character ahead, the following character must not form a word surrounds by any space followed by a colon.前面有一个字符的负前瞻,后面的字符不能形成一个由任何空格和冒号包围的单词。 Thus the group 2 never consumes any words before a colon因此第 2 组从不使用冒号前的任何单词

See the test cases查看测试用例

You can use a regex based on a lazy dot:您可以使用基于惰性点的正则表达式:

var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");

See the C# demo online and the .NET regex demo (please mind that regex101.com does not support .NET regex flavor).请参阅在线 C# 演示.NET 正则表达式演示(请注意 regex101.com 不支持 Z303CB0EF9EDB907261BB0EF9EDB9072 风味)

As you see, no need using a tempered greedy token .如您所见,无需使用经过调和的贪婪令牌 The regex means:正则表达式意味着:

  • (\w+) - Group 1: any one or more letters/digits/underscore (\w+) - 第 1 组:任何一个或多个字母/数字/下划线
  • \s*:\s* - a colon enclosed with zero or more whitespace chars \s*:\s* - 用零个或多个空格字符括起来的冒号
  • (.*?) - Group 2: any zero or more chars other than a newline, as few as possible (.*?) - 第 2 组:除换行符以外的任何零个或多个字符,尽可能少
  • (?=\s*\w+\s*:|$) - up to the first occurrence of one or more word chars enclosed with zero or more whitesapces or end of string. (?=\s*\w+\s*:|$) - 直到第一次出现一个或多个用零个或多个空格或字符串结尾包围的单词字符。

Full C# demo:完整的 C# 演示:

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var text = "field1   :    value1a  value1b   field2   : value2a value2b";
        var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
        foreach (Match m in matches)
        {
            Console.WriteLine("-- MATCH FOUND --\nKey: {0}, Value: {1}", 
                m.Groups[1].Value, m.Groups[2].Value);
        }
    }
}

Output: Output:

-- MATCH FOUND --
Key: field1, Value: value1a  value1b
-- MATCH FOUND --
Key: field2, Value: value2a value2b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM