如何使用ruby gsub Regexp与许多匹配？

Question

I have csv file contents having double quotes inside quoted text 我的csv文件内容在引用文本中有双引号

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good

I need to replace every double quote not preceded or succeeded by a comma by "" 我需要用“”替换逗号前面或后面的每个双引号。

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good

so " is replaced by "" 所以“被”替换为“”

I tried 我试过了

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")

but didn't work 但没有奏效

Answer 1

Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value: 如果引号出现在第一个值的开头或最后一个值的末尾，则正则表达式需要更大胆一些：

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9. 上面的正则表达式使用Ruby 1.9中可用的负向lookbehind和负向前瞻断言（锚点）。

(?<!^|,) — immediately preceding this spot there must not be either a start of line ( ^ ) or a comma (?<!^|,) - 紧接在此点之前，不得有行的开头（ ^ ）或逗号
" — find a double quote " - 找一个双引号
(?!,|$) — immediately following this spot there must not be either a comma or end of line ( $ ) (?!,|$) - 紧跟此点后不得有逗号或行尾（ $ ）

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \\1 correctly in your replacement string. 作为奖励，由于您实际上并未捕获任何一方的字符，因此您无需担心在替换字符串中正确使用\\1 。

For more information, see the section "Anchors" in the official Ruby regex documentation . 有关更多信息，请参阅官方Ruby regex文档中的“Anchors”部分。

However, for the case where you do need to replace matches in your output, you can use any of the following: 然而，因为你确实需要在输出中替换匹配的情况下，你可以使用任何如下：

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"

You can't use String interpolation in the replacement string, as you did: 您不能像替换字符串那样在替换字符串中使用字符串插值：

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"

…because that string interpolation happens once, before the gsub has been run. ...因为字符串插值在gsub运行之前发生过一次。 Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use. 使用gsub的块形式为每个匹配重新调用块，此时全局$1已被适当填充并可供使用。

Edit : For Ruby 1.8 (why on earth are you using that?) you can use: 编辑：对于Ruby 1.8（为什么你在使用它？）你可以使用：

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')

Answer 2

假设s是一个字符串，这将起作用：

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")

如何使用ruby gsub Regexp与许多匹配？

问题描述

2 个解决方案

解决方案1
44 已采纳 2012-02-01 16:46:20

解决方案2
9 2012-02-01 16:02:22

如何使用ruby gsub Regexp与许多匹配？

问题描述

2 个解决方案

解决方案1 44 已采纳 2012-02-01 16:46:20

解决方案2 9 2012-02-01 16:02:22

解决方案1
44 已采纳 2012-02-01 16:46:20

解决方案2
9 2012-02-01 16:02:22