简体   繁体   English

如何使用ruby gsub Regexp与许多匹配?

[英]How to use ruby gsub Regexp with many matches?

I have csv file contents having double quotes inside quoted text 我的csv文件内容在引用文本中有双引号

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good

I need to replace every double quote not preceded or succeeded by a comma by "" 我需要用“”替换逗号前面或后面的每个双引号。

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good

so " is replaced by "" 所以“被”替换为“”

I tried 我试过了

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")

but didn't work 但没有奏效

Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value: 如果引号出现在第一个值的开头或最后一个值的末尾,则正则表达式需要更大胆一些:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9. 上面的正则表达式使用Ruby 1.9中可用的负向lookbehind和负向前瞻断言(锚点)。

  • (?<!^|,) — immediately preceding this spot there must not be either a start of line ( ^ ) or a comma (?<!^|,) - 紧接在此点之前,不得有行的开头( ^ )或逗号
  • " — find a double quote " - 找一个双引号
  • (?!,|$) — immediately following this spot there must not be either a comma or end of line ( $ ) (?!,|$) - 紧跟此点后不得有逗号或行尾( $

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \\1 correctly in your replacement string. 作为奖励,由于您实际上并未捕获任何一方的字符,因此您无需担心在替换字符串中正确使用\\1

For more information, see the section "Anchors" in the official Ruby regex documentation . 有关更多信息,请参阅官方Ruby regex文档中的“Anchors”部分。


However, for the case where you do need to replace matches in your output, you can use any of the following: 然而,因为你确实需要在输出中替换匹配的情况下,你可以使用任何如下:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"

You can't use String interpolation in the replacement string, as you did: 您不能像替换字符串那样在替换字符串中使用字符串插值:

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"

…because that string interpolation happens once, before the gsub has been run. ...因为字符串插值在gsub运行之前发生过一次。 Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use. 使用gsub的块形式为每个匹配重新调用块,此时全局$1已被适当填充并可供使用。


Edit : For Ruby 1.8 (why on earth are you using that?) you can use: 编辑 :对于Ruby 1.8(为什么你在使用它?)你可以使用:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')

假设s是一个字符串,这将起作用:

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM