简体   繁体   English

如何在 Ruby 字符串中替换正则表达式匹配之外的内容?

[英]How to substitude outside of regexp matches in a Ruby string?

Given an example input like below:给定如下示例输入:

s = "an example with 'one' word and 'two and three' words inside quotes"

I'm trying to iterate over parts outside of quotes to do some substitutions.我正在尝试迭代引号之外的部分以进行一些替换。 For example to convert and to & but only outside of quotes to get:例如将and转换为&但只能在引号之外获取:

an example with 'one' word & 'two and three' words inside quotes

If I were to change inside of quotes, I could simply do the following:如果我要更改引号,我可以简单地执行以下操作:

s.gsub(/'.*?'/){ |q| q.gsub(/and/, '&') }

to get:要得到:

an example with 'one' word and 'two & three' words inside quotes

I mainly tried two things to adapt this strategy to outside of quotes.我主要尝试了两件事来使这种策略适应报价之外的情况。

First, I tried to negate the regexp inside first gsub (ie /'.*?'/ ).首先,我试图否定第一个gsub中的正则表达式(即/'.*?'/ )。 I imagine if there were a suffix modifier like /v I could simply do s.gsub(/'.*?'/v){... } , unfortunately I couldn't find anything like this.我想如果有像/v这样的后缀修饰符,我可以简单地做s.gsub(/'.*?'/v){... } ,不幸的是我找不到这样的东西。 There is a negative lookahead (ie (?!pat) ) but I don't think it is what I need.有一个负面的前瞻(即(?!pat) ),但我认为这不是我需要的。

Second, I tried to use split with gsub!其次,我尝试将splitgsub! as such:像这样:

puts s.split(/'.*?'/){ |r| r.gsub!(/and/, '&') }

Using split I can iterate over the parts outside of quotes:使用split我可以遍历引号之外的部分:

s.split(/'.*?'/){ |r| puts r }

to get:要得到:

an example with 
 word and 
 words inside quotes

However, I can't mutate these parts inside the block with gsub or gsub!但是,我不能用gsubgsub! . . I guess I need a mutating version of split , something akin to gsub being a mutating version of scan , but there doesn't seem to be anything like this.我想我需要一个变异版本的split ,类似于gsub的变异版本scan ,但似乎没有这样的东西。

Is there an easy way to make either of these approaches work?有没有一种简单的方法可以使这些方法中的任何一种都起作用?

You may match and capture what you need to keep and just match what you need to replace.您可以匹配并捕获您需要保留的内容,并且只匹配您需要替换的内容。

Use利用

s.gsub(/('[^']*')|and/) { $1 || '&' }
s.gsub(/('[^']*')|and/) { |m| m == $~[1] ? $~[1] : '&' }

If you need to match and as a whole word, use \band\b in the pattern instead of and .如果您需要将and作为一个完整的单词进行匹配,请在模式中使用\band\b而不是and

This approach is very convenient since you may add as many specific patterns you want to skip as you want.这种方法非常方便,因为您可以添加想要跳过的任意数量的特定模式。 Eg you want to also avoid matching a whole word and in between double quotation marks:例如,您还想避免在双引号and匹配整个单词:

s.gsub(/('[^']*'|"[^"]*")|\band\b/) { $1 || '&' }

Or, you want to make sure it is also skipping strings between quotes with escaped quotes:或者,您想确保它也在使用转义引号的引号之间跳过字符串:

s.gsub(/('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*")|\band\b/m) { $1 || '&' }

Or, if it appears outside of round, square, angle brackets and braces:或者,如果它出现在圆形、方形、尖括号和大括号之外:

s.gsub(/(<[^<>]*>|\{[^{}]*\}|\([^()]*\)|\[[^\]\[]*\])|\band\b/m) { $1 || '&' }

Match and capture substrings between single quotes and just match what you need to change.匹配和捕获单引号之间的子字符串,只匹配您需要更改的内容。 If Group 1 matches, put it back with $1 , else, replace with & .如果第 1 组匹配,则将其放回$1 ,否则,替换为& The replacement block in the second line just checks if the Group 1 value of the last match is the same as the currently matched value, and if yes, it puts it back, else, replaces with & .第二行中的替换块只是检查最后一个匹配的 Group 1 值是否与当前匹配的值相同,如果是,则将其放回原处,否则,替换为&

See a Ruby demo .请参阅Ruby 演示

Regex details正则表达式详细信息

  • ('[^']*') - Capturing group #1: ' , zero or more chars other than ' and then a ' char ('[^']*') - 捕获组 #1: ' ,除'之外的零个或多个字符,然后是一个'字符
  • | - or - 或者
  • and - and substring. and - and substring。

You can perform the desired substitutions by using the following regular expression.您可以使用以下正则表达式执行所需的替换。

r = /\G[^'\n]*?(?:'[^'\n]*'[^'\n]*?)*?\K\band\b/

Start your engine!启动你的引擎!

The Ruby code needed is as follows.所需的 Ruby 代码如下。

str = "an and with 'one' word and 'two and three' words and end"

str.gsub(r, '&')
  #=> "an & with 'one' word & 'two and three' words & end"

Ruby code tester Ruby码测试仪

Ruby's regex engine performs the following operations. Ruby 的正则表达式引擎执行以下操作。 Essentially, the regex asserts that "and" follows an even number of single quotes since the previous match, or an even number of single quotes from the beginning of the string if it is the first match.本质上,正则表达式断言"and"自上次匹配以来跟随偶数个单引号,或者如果它是第一个匹配,则从字符串开头跟随偶数个单引号。

\G          : asserts position at the end of the previous match
              or the start of the string for the first match
[^'\n]*?    : match 0+ chars other than ' and \n, lazily
(?:         : begin capture group
  '[^'\n]*' : match ' then 0+ chars other than ' and \n then '
  [^'\n]*?  : match 0+ chars other than ' and \n, lazily
)           : end non-capture group
*?          : execute non-capture group 0+ times, lazily 
\K          : forget everything matched so far and reset start of match
\band\b/    : match 'and'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM