I am trying to work on my own JSON parser. I have an input string that I want to tokenize:
input = "{ \\"foo\\": \\"bar\\", \\"num\\": 3}"
How do I remove the escape character \\
so that it is not a part of my tokens?
Currently, my solution using delete
works:
tokens = input.delete('\\\\"').split("")
=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]
However, when I try to use gsub
, it fails to find any \\"
.
tokens = input.gsub('\\\\"', '').split("")
=> ["{", " ", "\\"", "f", "o", "o", "\\"", ":", " ", "\\"", "b", "a", "r", "\\"", ",", " ", "\\"", "n", "u", "m", "\\"", ":", " ", "3", "}"]
I have two questions:
1. Why does gsub not work in this case?
2. How do I remove the backslash (escape) character? I currently have to remove the backslash character with the quotes to make this work.
When you write:
input = "{ \"foo\": \"bar\", \"num\": 3}"
The actual string stored in input is:
{ "foo": "bar", "num": 3}
The escape \\"
here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "
), and a normal character "
in a string (the escaped ones).
String#delete
deletes a character set specified the first parameter, rather than a pattern. All characters that is in the first parameter will be removed. So by writing
input.delete('\\"')
You got a string with all \\
and "
removed from input
, rather than a string with all \\"
sequence removed from input
. This is wrong for your case. It may cause unexpected behavior some time later.
String#gsub
, however, substitute a pattern (either regular expression or plain string).
input.gsub('\\"', '')
means find all \\"
(two characters in a sequence) and replace them with empty string. Since there isn't \\
in input
, nothing got replaced. What you need is actually:
input.gsub('"', '')
You do not have backslashes in your string. You have quotes in your string, which need to be escaped when placed in a double-quoted string. Look:
input = "{ \"foo\": \"bar\", \"num\": 3}"
puts input
# => { "foo": "bar", "num": 3}
You are removing - phantoms.
input.delete('\\"')
will delete any characters in its argument. Thus, you delete any non-existent backslashes, and also delete all quotes. Without quotes, the default display method ( inspect
) will not need to escape anything.
input.gsub('\\"', '')
will try to delete the sequence \\"
, which does not exist, so gsub
ends up doing nothing.
Make sure you know what the difference between string representation ( puts input.inspect
) and string content ( puts input
) is, and note the backslashes as the artifacts of the representation.
That said, I have to echo emaillenin: writing a correct JSON parser is not simple, and you can't do it with regular expressions (or at least, not with regular regular expressions; it might be possible with Oniguruma). It needs a proper parser like treetop or rex/racc, since it has a lot of corner cases that are easy to miss (chief among them being, ironically, escaped characters).
Use regex pattern:
> input = "{ \"foo\": \"bar\", \"num\": 3}"
> input.gsub(/"/,'').split("")
> => ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]
That is actually a double quote only. The slash is to escape it.
input.gsub(/[\\"]/,"")
也可以。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.