简体   繁体   中英

How to substitude outside of regexp matches in a Ruby string?

Given an example input like below:

s = "an example with 'one' word and 'two and three' words inside quotes"

I'm trying to iterate over parts outside of quotes to do some substitutions. For example to convert and to & but only outside of quotes to get:

an example with 'one' word & 'two and three' words inside quotes

If I were to change inside of quotes, I could simply do the following:

s.gsub(/'.*?'/){ |q| q.gsub(/and/, '&') }

to get:

an example with 'one' word and 'two & three' words inside quotes

I mainly tried two things to adapt this strategy to outside of quotes.

First, I tried to negate the regexp inside first gsub (ie /'.*?'/ ). I imagine if there were a suffix modifier like /v I could simply do s.gsub(/'.*?'/v){... } , unfortunately I couldn't find anything like this. There is a negative lookahead (ie (?!pat) ) but I don't think it is what I need.

Second, I tried to use split with gsub! as such:

puts s.split(/'.*?'/){ |r| r.gsub!(/and/, '&') }

Using split I can iterate over the parts outside of quotes:

s.split(/'.*?'/){ |r| puts r }

to get:

an example with 
 word and 
 words inside quotes

However, I can't mutate these parts inside the block with gsub or gsub! . I guess I need a mutating version of split , something akin to gsub being a mutating version of scan , but there doesn't seem to be anything like this.

Is there an easy way to make either of these approaches work?

You may match and capture what you need to keep and just match what you need to replace.

Use

s.gsub(/('[^']*')|and/) { $1 || '&' }
s.gsub(/('[^']*')|and/) { |m| m == $~[1] ? $~[1] : '&' }

If you need to match and as a whole word, use \band\b in the pattern instead of and .

This approach is very convenient since you may add as many specific patterns you want to skip as you want. Eg you want to also avoid matching a whole word and in between double quotation marks:

s.gsub(/('[^']*'|"[^"]*")|\band\b/) { $1 || '&' }

Or, you want to make sure it is also skipping strings between quotes with escaped quotes:

s.gsub(/('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*")|\band\b/m) { $1 || '&' }

Or, if it appears outside of round, square, angle brackets and braces:

s.gsub(/(<[^<>]*>|\{[^{}]*\}|\([^()]*\)|\[[^\]\[]*\])|\band\b/m) { $1 || '&' }

Match and capture substrings between single quotes and just match what you need to change. If Group 1 matches, put it back with $1 , else, replace with & . The replacement block in the second line just checks if the Group 1 value of the last match is the same as the currently matched value, and if yes, it puts it back, else, replaces with & .

See a Ruby demo .

Regex details

  • ('[^']*') - Capturing group #1: ' , zero or more chars other than ' and then a ' char
  • | - or
  • and - and substring.

You can perform the desired substitutions by using the following regular expression.

r = /\G[^'\n]*?(?:'[^'\n]*'[^'\n]*?)*?\K\band\b/

Start your engine!

The Ruby code needed is as follows.

str = "an and with 'one' word and 'two and three' words and end"

str.gsub(r, '&')
  #=> "an & with 'one' word & 'two and three' words & end"

Ruby code tester

Ruby's regex engine performs the following operations. Essentially, the regex asserts that "and" follows an even number of single quotes since the previous match, or an even number of single quotes from the beginning of the string if it is the first match.

\G          : asserts position at the end of the previous match
              or the start of the string for the first match
[^'\n]*?    : match 0+ chars other than ' and \n, lazily
(?:         : begin capture group
  '[^'\n]*' : match ' then 0+ chars other than ' and \n then '
  [^'\n]*?  : match 0+ chars other than ' and \n, lazily
)           : end non-capture group
*?          : execute non-capture group 0+ times, lazily 
\K          : forget everything matched so far and reset start of match
\band\b/    : match 'and'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM