简体   繁体   English

从字符串中删除反斜杠(转义字符)

[英]Removing backslash (escape character) from a string

I am trying to work on my own JSON parser. 我正在尝试使用自己的JSON解析器。 I have an input string that I want to tokenize: 我有一个我想要标记的输入字符串:

input = "{ \\"foo\\": \\"bar\\", \\"num\\": 3}"

How do I remove the escape character \\ so that it is not a part of my tokens? 如何删除转义字符\\以使它不是我的令牌的一部分?

Currently, my solution using delete works: 目前,我使用delete解决方案有效:

tokens = input.delete('\\\\"').split("")

=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]

However, when I try to use gsub , it fails to find any \\" . 但是,当我尝试使用gsub ,它找不到任何\\"

tokens = input.gsub('\\\\"', '').split("")

=> ["{", " ", "\\"", "f", "o", "o", "\\"", ":", " ", "\\"", "b", "a", "r", "\\"", ",", " ", "\\"", "n", "u", "m", "\\"", ":", " ", "3", "}"]

I have two questions: 我有两个问题:

1. Why does gsub not work in this case? 1.为什么gsub在这种情况下不起作用?

2. How do I remove the backslash (escape) character? 2.如何删除反斜杠(转义)字符? I currently have to remove the backslash character with the quotes to make this work. 我目前必须删除带引号的反斜杠字符才能使其工作。

When you write: 当你写:

input = "{ \"foo\": \"bar\", \"num\": 3}"

The actual string stored in input is: 存储在输入中的实际字符串是:

{ "foo": "bar", "num": 3}

The escape \\" here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most " ), and a normal character " in a string (the escaped ones). 转义\\"这里是Ruby的语法分析器解释,以便它可以字符串的边界(最左边和最右边区分" ),以及普通字符"在字符串(转义的)。

String#delete deletes a character set specified the first parameter, rather than a pattern. String#delete删除指定第一个参数的字符集 ,而不是模式。 All characters that is in the first parameter will be removed. 将删除第一个参数中的所有字符。 So by writing 所以通过写作

input.delete('\\"')

You got a string with all \\ and " removed from input , rather than a string with all \\" sequence removed from input . 你有一个字符串包含所有\\"input删除,而不是从input删除所有\\"序列的字符串。 This is wrong for your case. 这对你的情况是错误的。 It may cause unexpected behavior some time later. 一段时间后它可能会导致意外行为。

String#gsub , however, substitute a pattern (either regular expression or plain string). 但是, String#gsub替换模式(正则表达式或普通字符串)。

input.gsub('\\"', '')

means find all \\" (two characters in a sequence) and replace them with empty string. Since there isn't \\ in input , nothing got replaced. What you need is actually: 意味着找到所有\\" (序列中的两个字符)并用空字符串替换它们。由于input没有\\ ,没有任何内容被替换。你需要的是:

input.gsub('"', '')

You do not have backslashes in your string. 你的字符串中没有反斜杠。 You have quotes in your string, which need to be escaped when placed in a double-quoted string. 您的字符串中有引号,当放在双引号字符串中时需要对其进行转义。 Look: 看:

input = "{ \"foo\": \"bar\", \"num\": 3}"
puts input
# => { "foo": "bar", "num": 3}

You are removing - phantoms. 你正在删除 - 幽灵。

input.delete('\\"')

will delete any characters in its argument. 将删除其参数中的任何字符。 Thus, you delete any non-existent backslashes, and also delete all quotes. 因此,您删除任何不存在的反斜杠,并删除所有引号。 Without quotes, the default display method ( inspect ) will not need to escape anything. 如果没有引号,默认显示方法( inspect )将不需要转义任何内容。

input.gsub('\\"', '')

will try to delete the sequence \\" , which does not exist, so gsub ends up doing nothing. 将尝试删除不存在的序列\\" ,因此gsub最终无所事事。

Make sure you know what the difference between string representation ( puts input.inspect ) and string content ( puts input ) is, and note the backslashes as the artifacts of the representation. 确保你知道字符串表示( puts input.inspect )和字符串内容( puts input )之间的区别是什么,并注意反斜杠作为表示的工件。

That said, I have to echo emaillenin: writing a correct JSON parser is not simple, and you can't do it with regular expressions (or at least, not with regular regular expressions; it might be possible with Oniguruma). 也就是说,我必须回应emaillenin:编写一个正确的JSON解析器并不简单,你不能用正则表达式(或者至少不使用常规正则表达式;可能使用Oniguruma)。 It needs a proper parser like treetop or rex/racc, since it has a lot of corner cases that are easy to miss (chief among them being, ironically, escaped characters). 它需要一个适当的解析器,如treetop或rex / racc,因为它有很多很容易错过的角落案例(其中主要是讽刺的是,逃脱的角色)。

Use regex pattern: 使用正则表达式模式:

> input = "{ \"foo\": \"bar\", \"num\": 3}"
> input.gsub(/"/,'').split("")

> => ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]

That is actually a double quote only. 这实际上只是一个双引号。 The slash is to escape it. 斜线是逃避它。

input.gsub(/[\\"]/,"")也可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM