如何使用 Ruby 正则表达式从字符串中提取重复字符序列？

Question

I have such a string "++++001------zx.......?????????xxxxxxx" I would like to extract the more than one length continuous sequences into a flattened array with a Ruby regex:我有这样一个字符串 "++++001------zx.......?????????xxxxxx" 我想将多个长度的连续序列提取到一个带有 Ruby 正则表达式的扁平数组：

["++++",
"00",
"------",
".......",
"?????????",
"xxxxxxx"]

I can achieve this with a nested loop:我可以通过嵌套循环来实现这一点：

s="++++001------zx.......?????????xxxxxxx"
t=s.split(//)
i=0
f=[]
while i<=t.length-1 do
  j=i
  part=""
  while t[i]==t[j] do
    part=part+t[j]
    j=j+1
  end
  i=j
  if part.length>=2 then f.push(part) end
end

But I am unable to find an appropriate regex to feed into the scan method.但是我找不到合适的正则表达式来输入扫描方法。 I tried this: s.scan(/(.)\\1++/x) but it only captures the first character of the repeating sequences.我试过这个： s.scan(/(.)\\1++/x)但它只捕获重复序列的第一个字符。 Is it possible at all?有可能吗？

Answer 1

This is a bit tricky.这有点棘手。

You do want to capture any group that is more than one of any given character.您确实希望捕获超过任何给定字符之一的任何组。 So a good way to do this is using backreferences.所以这样做的一个好方法是使用反向引用。 Your solution is close to being correct.您的解决方案接近正确。

/((.)\\2+)/ should do the trick. /((.)\\2+)/应该可以解决问题。

Note that if you use scan, this will return two values for each match group.请注意，如果您使用扫描，这将为每个匹配组返回两个值。 The first being the sequence, and the second being the value.第一个是序列，第二个是值。

Answer 2

str =  "++++001------zx.......?????????xxxxxxx" 
str.chars.chunk{|e| e}.map{|e| e[1].join if e[1].size >1 }.compact
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]

Answer 3

In case you need to get overall match values only while ignoring (omitting) all capturing group values, similarly to how String#match works in JavaScript, you can use a String#gsub with a single regex argument (no replacement argument) to return an Enumerator , with .to_a to get the array of matches:如果您只需要在忽略（省略）所有捕获组值时获取整体匹配值，类似于String#match在 JavaScript String#match工作方式，您可以使用带有单个正则表达式参数（无替换参数）的String#gsub返回一个Enumerator ，使用.to_a获取匹配数组：

text = "++++001------zx.......?????????xxxxxxx" 
p text.gsub(/(.)\1+/m).to_a
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]

See the Ruby demo online and the Rubular demo (note how the matches are highlighted in the Match result field).查看在线 Ruby 演示和Rubular 演示（注意匹配结果字段中的匹配项是如何突出显示的）。

I added m modifier just for completeness, for the .我添加了m修饰符只是为了完整性，对于. to also match line break chars that a .还匹配 a 的换行符字符. does not match by default.默认不匹配。

Also, see a related Capturing groups don't work as expected with Ruby scan method thread.此外，请参阅相关的捕获组在使用 Ruby 扫描方法线程时无法按预期工作。

如何使用 Ruby 正则表达式从字符串中提取重复字符序列？

问题描述

3 个解决方案

解决方案1
3 2013-07-24 19:13:00

解决方案2
1 2013-07-24 18:56:07

解决方案3
0 2021-02-14 20:01:43

如何使用 Ruby 正则表达式从字符串中提取重复字符序列？

问题描述

3 个解决方案

解决方案1 3 2013-07-24 19:13:00

解决方案2 1 2013-07-24 18:56:07

解决方案3 0 2021-02-14 20:01:43

解决方案1
3 2013-07-24 19:13:00

解决方案2
1 2013-07-24 18:56:07

解决方案3
0 2021-02-14 20:01:43