简体   繁体   中英

Why does Ruby gsub not replace a second occurrence of this pattern?

I have a bit of code for escaping double-quotes from a string which may include pre-escaped quotes; eg:

This is a \"string"

Using the following code with Ruby 1.8.7p374:

string.gsub!(/([^\\])"/, '\1\"')

However, I get some funny edge-case when trying it on the following string: ab""c => ab\\""c . I would expect it to have escaped both quotes.

It's definitely not a big issue, but it got me curious.
Is this a mistake with my expression? A gsub bug/feature?

(In newer Ruby versions, this could probably be solved easily by using negative lookbacks, but they seem to be not supported in this version).

Requiring a match to a non- \\ character means the regex needs to consume that character as well as the quote. The gsub matches also cannot overlap.

You are right that a look-behind assertion would fix this. But without that available, you have a couple of choices in Ruby 1.8.7.

  1. Repeat until there are no substitutions made ( gsub! returns nil if there were no matches):

    loop { break unless string.gsub!(/([^\\\\])"/, '\\1\\"') }

  2. For 1.8.7, you don't have look-behind assertions. But you can reverse the string, use look-ahead assertions to make your changes, then reverse it back:

    string = string.reverse.gsub(/"(?!\\\\)/, '"\\\\').reverse

Your regex also won't work if there is a quote at the start of a string, eg "ab""c will transform to "ab\\""c . The reason for this is similar to your case with double quotes.

After gsub has matched b" and replaced it, it continues from the last match, looking at the next " , but doesn't look at the previously consumed characters.

You might be able to fix your issue with a lookbehind in newer Ruby versions, but that won't fix the beginning of string problem. The way to fix that is to use the \\G anchor (which is available in Ruby 1.8.7), which matches where the previous match ended or at the start of the string. So you are looking for a " that is either immediately after an non slash or is at the start of the current match (meaning a " has just been matched or this is the start of the string). Something like this:

string.gsub!(/([^\\]|\G)"/, '\1\"')

This will convert the string "ab""c to \\"ab\\"\\"c .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM