简体   繁体   中英

Regex matching multiple field value in a single line

i wish to match a multiple field value delimited by a colon in a single line, but each field and value text contains space eg

field1   :    value1a  value1b

answer
match1: Group1=field1, Group2=value1a value1b

or

field1   :    value1a  value1b   field2   : value2a value2b

answer
match1: Group1=field1, Group2=value1a value1b
match2: Group1=field2, Group2=value2a value2b

the best i can do right now is (\w+)\s*:\s*(\w+)

Regex regex = new Regex(@"(\w+)\s*:\s*(\w+)");
Match m = regex.Match("field1   :    value1a  value1b   field2   : value2a value2b");
while (m.Success)
{
   string f = m.Groups[1].Value.Trim();
   string v = m.Group2[2].Value.Trim();
}

i guess look ahead may help, but i don't know how to make it thank you

You may try

(\w+)\s*:\s*((?:(?!\s*\w+\s*:).)*)
  • (\w+) group 1, any consecutive words
  • \s*:\s* a colon with any space around
  • (...) group 2
  • (?:...)* a non capture group, repeats any times
  • (?:\s*\w+\s*.). negative lookahead with a character ahead, the following character must not form a word surrounds by any space followed by a colon. Thus the group 2 never consumes any words before a colon

See the test cases

You can use a regex based on a lazy dot:

var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");

See the C# demo online and the .NET regex demo (please mind that regex101.com does not support .NET regex flavor).

As you see, no need using a tempered greedy token . The regex means:

  • (\w+) - Group 1: any one or more letters/digits/underscore
  • \s*:\s* - a colon enclosed with zero or more whitespace chars
  • (.*?) - Group 2: any zero or more chars other than a newline, as few as possible
  • (?=\s*\w+\s*:|$) - up to the first occurrence of one or more word chars enclosed with zero or more whitesapces or end of string.

Full C# demo:

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var text = "field1   :    value1a  value1b   field2   : value2a value2b";
        var matches = Regex.Matches(text, @"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
        foreach (Match m in matches)
        {
            Console.WriteLine("-- MATCH FOUND --\nKey: {0}, Value: {1}", 
                m.Groups[1].Value, m.Groups[2].Value);
        }
    }
}

Output:

-- MATCH FOUND --
Key: field1, Value: value1a  value1b
-- MATCH FOUND --
Key: field2, Value: value2a value2b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM