简体   繁体   中英

Ruby's string: Escape and unescape a custom character

Suppose I said £ character as dangerous, and I want to be able to protect and to unprotect any string. And vice versa.

Example 1:

"Foobar £ foobar foobar foobar."  # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string

Example 2:

"Foobar £ foobar £££££££foobar foobar."         # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string

Example 3:

"Foobar \£ foobar \\£££££££foobar foobar."        # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string

Is there an easy way, with Ruby, to escape (and unescape) a given character (such as £ in my example) from a string?

Edit: here is an explication about the behavior of this question.

First of all, thanks for your answers. I have a Rails app with a Tweet model having a content field. Example of tweet:

tweet = Tweet.create(content: "Hello @bob")

Inside the model, there's a serialization process that converte the string like this:

dump('Hello @bob') # => '["Hello £", 42]'
                   # ... where 42 is the id of bob username

Then, I'm able to deserialize and display its tweet like this:

load('["Hello £", 42]') # => 'Hello @bob'

In the same way, it's also possible to do so with more than one username:

dump('Hello @bob and @joe!')        # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello @bob and @joe!'

That's the goal :)

But this find-and-replace could be hard to perform with something like:

tweet = Tweet.create(content: "£ Hello @bob")

'cause here we also have to escape £ char. And I think your solution is good for this. So the result become:

dump('£ Hello @bob')       # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello @bob'

Just perfect. <3 <3

Now, if there is this:

tweet = Tweet.create(content: "\£ Hello @bob")

I think we first should escape every \\ , and then escape every £ , like:

dump('\£ Hello @bob')       # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello @bob'

However... how can we do in this case:

tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello @bob")

...where tweet.content.gsub(/(?<!\\\\)(?=(?:\\\\\\\\)*£)/, "\\\\") seems not working.

Hopefully your version of ruby supports lookbehinds. If it doesn't my solution will not work for you.

Escape characters :

str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")

Un-escape characters :

str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")

Both regexes will work regardless of the amount of backslashes. They are complementing each other.

Escape explanation :

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   £           # Match the character “£” literally
)
"

Not that I am matching a certain position. No text is consumed at all. When I pinpoint the position I want I insert a \\.

Explanation of unescape :

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(           # Match the regular expression below and capture its match into backreference number 1
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\          # Match the character “\” literally
£           # Match the character “£” literally
"

Here I am saving all the backslashes minus one and and I replace this number of backslashes with the special character. Tricky stuff :)

If you are using Ruby 1.9, which has lookbehind, then FailedDev's answer should work quite well. If you are using Ruby 1.8, which does not have lookbehind (I think), a different approach may work. Give this a try:

text.gsub!(/(\\.)|£)/m) do
    if ($1 != nil)  # If escaped anything
        "$1"        # replace with self.
    else            # Otherwise escape the
        "\\£"       # unescaped £.
    end
end

Note that I am not a Ruby programmer and this snippet is untested (in particular I'm not sure if the: if ($1 != nil) statement usage is correct - it may need to be: if ($1 != "") or if ($1) ), but I do know that this general technique (using code in place of a simple replacement string) works. I recently used this same technique for my JavaScript solution to a similar question which was looking to find unescaped asterisks.

I'm not sure if this is what you want, but I think you can do a simple find-and-replace:

str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape

Note that I changed \\ to \\\\ because you have to escape the backslash in a double-quoted string.


Edit: I think what you want is a regex that matches an odd number of backslashes:

str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")

That does the following transformations

"£"       #=> "£"
"\\£"     #=> "£"
"\\\\£"   #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM