简体   繁体   中英

C# Regex to parse json-like top object

I need to parse json-like text file to extract objects like this with C# Regex:

foo {
    line1
    line2
}

bar {
    line3
    line4
}

ie objects that begins and ends at the begin of a line. C# test code:

Regex regex = new Regex("\\n[^ \\n]+ \\{[.\\n]+\\n\\}");
string s = "\nfoo {\n    line1\n    line2\n}";
string v = regex.Match(s).Value;

it means:

new line->anything except space and new line->" {"->any thing plus new line->new line->}

The expected result is just s. But the result is empty string. If I remove "\\\\n\\\\}" at the end:

Regex regex = new Regex("\\n[^ \\n]+ \\{[.\\n]+");
string s = "\nfoo {\n    line1\n    line2\n}";
string v = regex.Match(s).Value;

then v="\\nfoo {\\n"

this works as expected, so it seem that the problem comes from "\\\\n\\\\}" .

For your example data, you could match the first line ending with an opening curly brace.

Then use a repeating pattern to match the whole line only if it does not start with a closing } . You could do that using a negative lookahead (?!}) .

Then match the closing curly brace.

[\r\n]\S+\s*{[\r\n](?:(?!}).*[\r\n])*}

About the pattern

  • [\\r\\n] Match newline
  • \\S+\\s* Match 1+ times a non whitespace char, then 0+ times a whitespace char
  • {[\\r\\n] Match opening { followed by a newline
  • (?:(?!}) Negative lookahead, assert what is directly on the right is not a }
  • .*[\\r\\n] Match any char except a newline 0+ times, then match a newline
  • } Match closing }

.NET regex demo | C# demo

For example:

        Regex regex = new Regex(@"^\S+\s*{[\r\n](?:(?!}).*[\r\n])*}"); 
        string s = @"foo {
    line1
    line2
}

bar {
    line3
    line4
}";

        Console.WriteLine(regex.Match(s).Value);

Result:

foo {
    line1
    line2
}

I find a working solution:

Regex r1 = new Regex("\\n[^ \\n]+ \\{[\\s\\S]+?\\n\\}");
string s = "\nfoo {\n    line1\n    line2\n}";
string v = r1.Match(s).Value;

now v="\\nfoo {\\n line1\\n line2\\n}"

I'm new to regular expressions, after more research, a reference says that when you put "." in "[]", the dot won't represent "anything" anymore. You can use "[\\s\\S]" to represent everything including new line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM