Here is my c# regex:
\"([a-zA-Z0-9]*)\":\"?([a-zA-Z0-9]*)\"?,?}?
I am testing here with sample string:
{"RestrictedCompany": "","SQLServerIndex": 0,"SurveyAdmin": false}`
This is what I think the regex does:
PART 1: Look for the pattern of
"
ANYTHING":
and store ANYTHING (without the quotes).PART 2: Then look for a
:
and store everything until you reach a stop character of either"
or,
or}
It extracts part 1 fine, but doesnt pick up part 2 at all when the "
isnt present (ie when part 2 isnt a string). So I have two questions:
\\S
but it was too greedy) First off, don't write your own JSON parser. Use one written by professionals. You're reinventing a rather complex wheel here.
That said, there are also lessons you could learn here about how to write, understand and debug regular expressions, so let's look at that.
Why isn't my current code picking up part 2? (and how can I fix it)
Learn to reason like the regular expression engine.
Let's take a simpler case. We'll take the expression
\"([a-zA-Z0-9]*)\":\"?([a-zA-Z0-9]*)\"?,?}?
And we will search this string:
{"A": "B"}
for an instance of the regular expression.
OK.
{
doesn't match anything, so skip it. "
matches \\"
, so maybe we have a match. A
matches ([a-zA-Z0-9]*)
, so again, maybe we have a match. "
matches the second \\"
, so we're still good. :
matches :
... \\"?
, zero or one quotes. We have
, a space. We match zero quotes. ([a-zA-Z0-9]*)
, any number of alphanumerics. We have
, a space. Therefore we have zero alphanumerics. \\"?
, and again we have
, so we match zero. ,?
, we have zero of them. }?
, again we have zero of them "A":
. :
, and there is no :
in the rest of the string, so I won't labour the point; plainly the match will fail. If that's not the pattern you wanted to match then write a different pattern. For example, if you want there to be arbitrary whitespace before and after the colon, you probably need a /s*
before and after the colon. Also, if you require a value after the :
then why did you make everything after the colon optional ? "Required" and "optional" are opposites.
So what's the right thing to do here? Again, the right thing to do is to stop trying to solve this problem with regular expressions and use a json parser like a sensible person. But suppose we did want to parse this with regular expressions. How do we do it?
We do it by breaking the problem down into smaller parts.
What do we really want to match? Let's name each thing we want to match and then write a colon, and then say what the structure of that thing is:
DESIRED : NAME OPTIONAL_WHITESPACE COLON OPTIONAL_WHITESPACE VALUE
OK, break it down. What's a name?
NAME : QUOTE NAMECONTENTS QUOTE
Keep breaking it down.
NAMECONTENTS : any alphanumeric text of any length
Ask yourself is that true? Is an ""
a NAME
? Is "1234"
a NAME
? Is "$"
a NAME
? Refine the pattern until you get it right. We'll go with this for now.
Now here is a hard one:
VALUE : BOOLEAN_LITERAL
VALUE : NUMBER_LITERAL
VALUE : STRING_LITERAL
This can be any of three things. So again, keep breaking it down:
BOOLEAN_LITERAL : true
BOOLEAN_LITERAL : false
Keep going; you can see how to do it from here.
Now make a regular expression for each part and start putting it back together .
NAMECONTENTS
is \\w*
. QUOTE
is \\"
. NAME
is \\"\\w*\\"
. \\"(\\w*)\\"
Great. Similarly:
OPTIONAL_WHITESPACE
is \\s*
. COLON
is :
. \\"(\\w*)\\"\\s:\\s
Now we need to handle VALUE
. But we've broken it down. What is the regular expression for BOOLEAN_LITERAL
? That's [true|false]
.
Keep going; make a regular expression for the other literals and then build up your regular expression from the leaves to the root .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.