如何在 Ruby 字符串中替换正则表达式匹配之外的内容？

Question

给定如下示例输入：

s = "an example with 'one' word and 'two and three' words inside quotes"

我正在尝试迭代引号之外的部分以进行一些替换。 例如将and转换为&但只能在引号之外获取：

an example with 'one' word & 'two and three' words inside quotes

如果我要更改引号内，我可以简单地执行以下操作：

s.gsub(/'.*?'/){ |q| q.gsub(/and/, '&') }

要得到：

an example with 'one' word and 'two & three' words inside quotes

我主要尝试了两件事来使这种策略适应报价之外的情况。

首先，我试图否定第一个gsub中的正则表达式（即/'.*?'/ ）。 我想如果有像/v这样的后缀修饰符，我可以简单地做s.gsub(/'.*?'/v){... } ，不幸的是我找不到这样的东西。 有一个负面的前瞻（即(?!pat) ），但我认为这不是我需要的。

其次，我尝试将split与gsub! 像这样：

puts s.split(/'.*?'/){ |r| r.gsub!(/and/, '&') }

使用split我可以遍历引号之外的部分：

s.split(/'.*?'/){ |r| puts r }

要得到：

an example with 
 word and 
 words inside quotes

但是，我不能用gsub或gsub! . 我想我需要一个变异版本的split ，类似于gsub的变异版本scan ，但似乎没有这样的东西。

有没有一种简单的方法可以使这些方法中的任何一种都起作用？

Answer 1

您可以匹配并捕获您需要保留的内容，并且只匹配您需要替换的内容。

利用

s.gsub(/('[^']*')|and/) { $1 || '&' }
s.gsub(/('[^']*')|and/) { |m| m == $~[1] ? $~[1] : '&' }

如果您需要将and作为一个完整的单词进行匹配，请在模式中使用\band\b而不是and 。

这种方法非常方便，因为您可以添加想要跳过的任意数量的特定模式。 例如，您还想避免在双引号and匹配整个单词：

s.gsub(/('[^']*'|"[^"]*")|\band\b/) { $1 || '&' }

或者，您想确保它也在使用转义引号的引号之间跳过字符串：

s.gsub(/('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*")|\band\b/m) { $1 || '&' }

或者，如果它出现在圆形、方形、尖括号和大括号之外：

s.gsub(/(<[^<>]*>|\{[^{}]*\}|\([^()]*\)|\[[^\]\[]*\])|\band\b/m) { $1 || '&' }

匹配和捕获单引号之间的子字符串，只匹配您需要更改的内容。 如果第 1 组匹配，则将其放回$1 ，否则，替换为& 。 第二行中的替换块只是检查最后一个匹配的 Group 1 值是否与当前匹配的值相同，如果是，则将其放回原处，否则，替换为& 。

请参阅Ruby 演示。

正则表达式详细信息

('[^']*') - 捕获组 #1: ' ，除'之外的零个或多个字符，然后是一个'字符
| - 或者
and - and substring。

Answer 2

您可以使用以下正则表达式执行所需的替换。

r = /\G[^'\n]*?(?:'[^'\n]*'[^'\n]*?)*?\K\band\b/

启动你的引擎！

所需的 Ruby 代码如下。

str = "an and with 'one' word and 'two and three' words and end"

str.gsub(r, '&')
  #=> "an & with 'one' word & 'two and three' words & end"

Ruby码测试仪

Ruby 的正则表达式引擎执行以下操作。 本质上，正则表达式断言"and"自上次匹配以来跟随偶数个单引号，或者如果它是第一个匹配，则从字符串开头跟随偶数个单引号。

\G          : asserts position at the end of the previous match
              or the start of the string for the first match
[^'\n]*?    : match 0+ chars other than ' and \n, lazily
(?:         : begin capture group
  '[^'\n]*' : match ' then 0+ chars other than ' and \n then '
  [^'\n]*?  : match 0+ chars other than ' and \n, lazily
)           : end non-capture group
*?          : execute non-capture group 0+ times, lazily 
\K          : forget everything matched so far and reset start of match
\band\b/    : match 'and'

如何在 Ruby 字符串中替换正则表达式匹配之外的内容？

问题描述

2 个解决方案

解决方案1
1 2020-06-24 19:40:24

解决方案2
1 2020-06-24 22:48:28

如何在 Ruby 字符串中替换正则表达式匹配之外的内容？

问题描述

2 个解决方案

解决方案1 1 2020-06-24 19:40:24

解决方案2 1 2020-06-24 22:48:28

解决方案1
1 2020-06-24 19:40:24

解决方案2
1 2020-06-24 22:48:28