Ruby的字符串：Escape和unescape自定义字符

Question

Suppose I said £ character as dangerous, and I want to be able to protect and to unprotect any string. 假设我说£字符是危险的，我希望能够保护和取消保护任何字符串。 And vice versa. 反之亦然。

Example 1: 例1：

"Foobar £ foobar foobar foobar."  # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string

Example 2: 例2：

"Foobar £ foobar £££££££foobar foobar."         # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string

Example 3: 例3：

"Foobar \£ foobar \\£££££££foobar foobar."        # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string

Is there an easy way, with Ruby, to escape (and unescape) a given character (such as £ in my example) from a string? 有一个简单的方法，使用Ruby，从字符串中逃避（和unescape）给定字符（例如我的例子中的£ ）？

Edit: here is an explication about the behavior of this question. 编辑：这里是关于这个问题的行为的解释。

First of all, thanks for your answers. 首先，感谢您的回答。 I have a Rails app with a Tweet model having a content field. 我有一个带有Tweet模型的Rails应用程序，它有一个content字段。 Example of tweet: 推文示例：

tweet = Tweet.create(content: "Hello @bob")

Inside the model, there's a serialization process that converte the string like this: 在模型内部，有一个序列化过程可以转换字符串，如下所示：

dump('Hello @bob') # => '["Hello £", 42]'
                   # ... where 42 is the id of bob username

Then, I'm able to deserialize and display its tweet like this: 然后，我能够反序列化并显示它的推文：

load('["Hello £", 42]') # => 'Hello @bob'

In the same way, it's also possible to do so with more than one username: 同样，使用多个用户名也可以这样做：

dump('Hello @bob and @joe!')        # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello @bob and @joe!'

That's the goal :) 这就是目标:)

But this find-and-replace could be hard to perform with something like: 但是，这种查找和替换可能很难通过以下方式执行：

tweet = Tweet.create(content: "£ Hello @bob")

'cause here we also have to escape £ char. 因为在这里我们也必须逃避£ char。 And I think your solution is good for this. 我认为你的解决方案对此有好处。 So the result become: 结果变成了：

dump('£ Hello @bob')       # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello @bob'

Just perfect. 刚刚好。 <3 <3 <3 <3

Now, if there is this: 现在，如果有这样的话：

tweet = Tweet.create(content: "\£ Hello @bob")

I think we first should escape every \\ , and then escape every £ , like: 我认为我们首先应该对每个\\进行转义，然后对每个£进行转义，例如：

dump('\£ Hello @bob')       # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello @bob'

However... how can we do in this case: 但是......在这种情况下我们该怎么做：

tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello @bob")

...where tweet.content.gsub(/(?<!\\\\)(?=(?:\\\\\\\\)*£)/, "\\\\") seems not working. ...其中tweet.content.gsub(/(?<!\\\\)(?=(?:\\\\\\\\)*£)/, "\\\\")似乎不起作用。

Answer 1

Hopefully your version of ruby supports lookbehinds. 希望您的Ruby版本支持lookbehinds。 If it doesn't my solution will not work for you. 如果没有，我的解决方案将无法为您服务。

Escape characters : 转义字符：

str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")

Un-escape characters : 取消转义字符：

str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")

Both regexes will work regardless of the amount of backslashes. 不管反斜杠的数量如何，这两个正则表达式都可以工作。 They are complementing each other. 它们是相辅相成的。

Escape explanation : 转义说明：

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   £           # Match the character “£” literally
)
"

Not that I am matching a certain position. 不是说我匹配某个位置。 No text is consumed at all. 根本不消耗任何文本。 When I pinpoint the position I want I insert a \\. 当我确定位置时，我要插入\\。

Explanation of unescape : unescape的说明：

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(           # Match the regular expression below and capture its match into backreference number 1
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\          # Match the character “\” literally
£           # Match the character “£” literally
"

Here I am saving all the backslashes minus one and and I replace this number of backslashes with the special character. 在这里，我保存所有的反斜杠减一，然后用特殊字符替换此反斜杠的数量。 Tricky stuff :) 棘手的东西:)

Answer 2

If you are using Ruby 1.9, which has lookbehind, then FailedDev's answer should work quite well. 如果你使用的是Ruby 1.9，它有后观，那么FailedDev的答案应该可以很好地运行。 If you are using Ruby 1.8, which does not have lookbehind (I think), a different approach may work. 如果您使用的是Ruby 1.8（我认为该方法没有落后之处），则可以使用其他方法。 Give this a try: 尝试一下：

text.gsub!(/(\\.)|£)/m) do
    if ($1 != nil)  # If escaped anything
        "$1"        # replace with self.
    else            # Otherwise escape the
        "\\£"       # unescaped £.
    end
end

Note that I am not a Ruby programmer and this snippet is untested (in particular I'm not sure if the: if ($1 != nil) statement usage is correct - it may need to be: if ($1 != "") or if ($1) ), but I do know that this general technique (using code in place of a simple replacement string) works. 请注意，我不是Ruby程序员，这个代码段未经测试（特别是我不确定： if ($1 != nil)语句用法是否正确-它可能需要是： if ($1 != "")或if ($1) ），但我确实知道这种通用技术（使用代码代替简单的替换字符串）有效。 I recently used this same technique for my JavaScript solution to a similar question which was looking to find unescaped asterisks. 我最近在JavaScript解决方案中使用了相同的技术来解决类似的问题，该问题旨在查找未转义的星号。

Answer 3

I'm not sure if this is what you want, but I think you can do a simple find-and-replace: 我不确定这是不是你想要的，但我认为你可以做一个简单的查找和替换：

str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape

Note that I changed \\ to \\\\ because you have to escape the backslash in a double-quoted string. 请注意，我将\\更改为\\\\因为您必须在双引号字符串中转义反斜杠。

Edit: I think what you want is a regex that matches an odd number of backslashes: 编辑：我认为你想要的是一个匹配奇数反斜杠的正则表达式：

str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")

That does the following transformations 这做了以下转换

"£"       #=> "£"
"\\£"     #=> "£"
"\\\\£"   #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"

Ruby的字符串：Escape和unescape自定义字符

问题描述

3 个解决方案

解决方案1
2 2011-10-29 00:40:48

解决方案2
1 2011-10-29 03:40:46

解决方案3
0 2011-10-28 23:18:01

Ruby的字符串：Escape和unescape自定义字符

问题描述

3 个解决方案

解决方案1 2 2011-10-29 00:40:48

解决方案2 1 2011-10-29 03:40:46

解决方案3 0 2011-10-28 23:18:01

解决方案1
2 2011-10-29 00:40:48

解决方案2
1 2011-10-29 03:40:46

解决方案3
0 2011-10-28 23:18:01