I'm using an API which returns text in the following format:
#start
#p 12345 foo
#p 12346 bar
#end
#start
#p 12345 foo2
#p 12346 bar2
#end
My parsing function:
function parseApiResponse(data) {
var results = [], match, obj;
while (match = CST.REGEX.POST.exec(/(#start)|(#end)|#p\s+(\S+)\s+(\S+)/ig)) {
if (match[1]) { // #start
obj = {};
} else if (match[2]) { // #end
results.push(obj);
obj = null; // prevent accidental reuse
// if input is malformed
} else { // #p something something
obj[match[3]] = match[4];
}
}
return results;
}
This will give me a list of objects which looks something like this:
[{ '12345': 'foo', '12346': 'bar'}, /* etc... */]
However, if a line is formatted like this
#start
#p 12345
#p 12346 bar
#end
The line would actually be #p 12345\\n
and my match[4]
would contain the next row's #p
.
How do I adjust the pattern to adapt to this?
Assuming you have one #start
, #end
, or #p
element per line, you can make your regex aware of this and add an additional non-capturing group to indicate that the last \\s+(\\S+)
in a line is optional:
/(#start)|(#end)|#p\\s+(\\S+)(?:\\s+(\\S+))?$/igm
(?: )
is saying "treat this as a group, but don't capture the pattern it matches" (so it won't create an element in match
). The ?
that follows that group means "this group is optional and may or may not match anything in the pattern". The $
right after that, in conjunction with the m
flag, matches the end of the line.
You can also avoid the (?: )
trickery by using * instead of + quantifiers, meaning "match zero or more times": change \\s+(\\S+)
to \\s*(\\S*)
. This has the side effect that the space between the number and the data that follows it is now optional.
I would rewrite the regex and refactor the code a bit as follows:
while (match = CST.REGEX.POST.exec(/^#(start|end|p)(?:\s+(\d+)(?:[^\S\r\n]+([^\r\n]+))?)?$/igm)) {
switch (match[1]) {
case 'start':
obj = {};
break;
case 'end':
results.push(obj);
obj = null;
break;
case 'p':
obj[match[2]] = match[3];
break;
}
}
I like capturing start
, end
, or p
in the one capture group so I can use it in a switch
statement. The version of the regex I use here is a little more discriminating (expects the token that follows #p
to be numeric) and a little more forgiving (allows the last token on a #p
line to contain any non-linebreak whitespace, eg #p 1138 this is only a test
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.