简体   繁体   中英

c# regex referring to repeating named group

I'm writing a regex to convert the following:

string input = @"
 {b sub}{b or}{b di}{b nate} def1...
 {b sub}{b tro}{b pi}{b cal} def2...
 {b su}{b per} def3...
 {b sum} def4...
 ";

into this:

'subordinate': def1...
'subtropical': def2...
'super': def3...
'sum': def4...

Ie, I need to remove '{b ' and '}' and decorate it, etc.

I'm not sure how to do it. I know how to match {b } but not how to match all of them and decorate it with quotes.

Regex.Replace(input, @"(\{b (?<Text>[^ }]+)})+", @"'${Text}'") 

returns

 'nate' def1...
 'cal' def2...
 'per' def3...
 'sum' def4...

Ie, just last match within given instance ... . No idea how to refer to "all" groups of Text not just last group in given instance.

Sorry I can't even find what's the proper name for "the given instance" ... .

You may match the repeated substrings with Regex.Replace and then un-brace the separate braced substrings in the match evaluator part and format the whole match as you need.

Here is an example:

string input = @"{b sub}{b or}{b di}{b nate} def1...
{b sub}{b tro}{b pi}{b cal} def2...
{b su}{b per} def3...
{b sum} def4... ";
string result = Regex.Replace(input, @"(?:\{b\s+[^{}]*})+", m =>
            "'" + Regex.Replace(m.Value, @"\{b\s+([^{}]*)}", "$1") + "':");
Console.WriteLine(result);

See the C# demo , output:

'subordinate': def1...
'subtropical': def2...
'super': def3...
'sum': def4... 

The (?:\\{b\\s+[^{}]*})+ expression matches 1 or more repetitions of {b , 1+ whitespaces and then 0 or more chars other than { and } up to and including } , and then, when the match is found, it is processed with \\{b\\s+([^{}]*)} regex that only matches 1 such sequence and captures the part after b + whitespaces and before } replacing it with the group 1 contents.

Try this:

  string input = @"
   {b sub}{b or}{b di}{b nate} def1...
   {b sub}{b tro}{b pi}{b cal} def2...
   {b su}{b per} def3...
   {b sum} def4...
   ";
  input = input.Replace("{b ", "").Replace("}", "");
  input = Regex.Replace(input, @"\n\s+(\w+)", @"\n'$1':");

Pattern explanation:

\\n\\s+(\\w+) - match newline, one or mroe whitespaces, then match one or more word characters and store it in capturing group.

You can use \\{b (?<Text>[^}]+)\\} as the pattern and replace it with ${Text} after you first replace \\{.*\\} with '${0}': .

Multiline, of course.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM