[英]Removing backslash (escape character) from a string
I am trying to work on my own JSON parser. 我正在尝试使用自己的JSON解析器。 I have an input string that I want to tokenize:
我有一个我想要标记的输入字符串:
input = "{ \\"foo\\": \\"bar\\", \\"num\\": 3}"
How do I remove the escape character \\
so that it is not a part of my tokens? 如何删除转义字符
\\
以使它不是我的令牌的一部分?
Currently, my solution using delete
works: 目前,我使用
delete
解决方案有效:
tokens = input.delete('\\\\"').split("")
=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]
However, when I try to use gsub
, it fails to find any \\"
. 但是,当我尝试使用
gsub
,它找不到任何\\"
。
tokens = input.gsub('\\\\"', '').split("")
=> ["{", " ", "\\"", "f", "o", "o", "\\"", ":", " ", "\\"", "b", "a", "r", "\\"", ",", " ", "\\"", "n", "u", "m", "\\"", ":", " ", "3", "}"]
I have two questions: 我有两个问题:
1. Why does gsub not work in this case? 1.为什么gsub在这种情况下不起作用?
2. How do I remove the backslash (escape) character? 2.如何删除反斜杠(转义)字符? I currently have to remove the backslash character with the quotes to make this work.
我目前必须删除带引号的反斜杠字符才能使其工作。
When you write: 当你写:
input = "{ \"foo\": \"bar\", \"num\": 3}"
The actual string stored in input is: 存储在输入中的实际字符串是:
{ "foo": "bar", "num": 3}
The escape \\"
here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "
), and a normal character "
in a string (the escaped ones). 转义
\\"
这里是Ruby的语法分析器解释,以便它可以字符串的边界(最左边和最右边区分"
),以及普通字符"
在字符串(转义的)。
String#delete
deletes a character set specified the first parameter, rather than a pattern. String#delete
删除指定第一个参数的字符集 ,而不是模式。 All characters that is in the first parameter will be removed. 将删除第一个参数中的所有字符。 So by writing
所以通过写作
input.delete('\\"')
You got a string with all \\
and "
removed from input
, rather than a string with all \\"
sequence removed from input
. 你有一个字符串包含所有
\\
和"
从input
删除,而不是从input
删除所有\\"
序列的字符串。 This is wrong for your case. 这对你的情况是错误的。 It may cause unexpected behavior some time later.
一段时间后它可能会导致意外行为。
String#gsub
, however, substitute a pattern (either regular expression or plain string). 但是,
String#gsub
替换模式(正则表达式或普通字符串)。
input.gsub('\\"', '')
means find all \\"
(two characters in a sequence) and replace them with empty string. Since there isn't \\
in input
, nothing got replaced. What you need is actually: 意味着找到所有
\\"
(序列中的两个字符)并用空字符串替换它们。由于input
没有\\
,没有任何内容被替换。你需要的是:
input.gsub('"', '')
You do not have backslashes in your string. 你的字符串中没有反斜杠。 You have quotes in your string, which need to be escaped when placed in a double-quoted string.
您的字符串中有引号,当放在双引号字符串中时需要对其进行转义。 Look:
看:
input = "{ \"foo\": \"bar\", \"num\": 3}"
puts input
# => { "foo": "bar", "num": 3}
You are removing - phantoms. 你正在删除 - 幽灵。
input.delete('\\"')
will delete any characters in its argument. 将删除其参数中的任何字符。 Thus, you delete any non-existent backslashes, and also delete all quotes.
因此,您删除任何不存在的反斜杠,并删除所有引号。 Without quotes, the default display method (
inspect
) will not need to escape anything. 如果没有引号,默认显示方法(
inspect
)将不需要转义任何内容。
input.gsub('\\"', '')
will try to delete the sequence \\"
, which does not exist, so gsub
ends up doing nothing. 将尝试删除不存在的序列
\\"
,因此gsub
最终无所事事。
Make sure you know what the difference between string representation ( puts input.inspect
) and string content ( puts input
) is, and note the backslashes as the artifacts of the representation. 确保你知道字符串表示(
puts input.inspect
)和字符串内容( puts input
)之间的区别是什么,并注意反斜杠作为表示的工件。
That said, I have to echo emaillenin: writing a correct JSON parser is not simple, and you can't do it with regular expressions (or at least, not with regular regular expressions; it might be possible with Oniguruma). 也就是说,我必须回应emaillenin:编写一个正确的JSON解析器并不简单,你不能用正则表达式(或者至少不使用常规正则表达式;可能使用Oniguruma)。 It needs a proper parser like treetop or rex/racc, since it has a lot of corner cases that are easy to miss (chief among them being, ironically, escaped characters).
它需要一个适当的解析器,如treetop或rex / racc,因为它有很多很容易错过的角落案例(其中主要是讽刺的是,逃脱的角色)。
Use regex pattern: 使用正则表达式模式:
> input = "{ \"foo\": \"bar\", \"num\": 3}"
> input.gsub(/"/,'').split("")
> => ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]
That is actually a double quote only. 这实际上只是一个双引号。 The slash is to escape it.
斜线是逃避它。
input.gsub(/[\\"]/,"")
也可以。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.