[英]How to extract repeating character sequences from a string with Ruby regex?
I have such a string "++++001------zx.......?????????xxxxxxx" I would like to extract the more than one length continuous sequences into a flattened array with a Ruby regex:我有这样一个字符串 "++++001------zx.......?????????xxxxxx" 我想将多个长度的连续序列提取到一个带有 Ruby 正则表达式的扁平数组:
["++++",
"00",
"------",
".......",
"?????????",
"xxxxxxx"]
I can achieve this with a nested loop:我可以通过嵌套循环来实现这一点:
s="++++001------zx.......?????????xxxxxxx"
t=s.split(//)
i=0
f=[]
while i<=t.length-1 do
j=i
part=""
while t[i]==t[j] do
part=part+t[j]
j=j+1
end
i=j
if part.length>=2 then f.push(part) end
end
But I am unable to find an appropriate regex to feed into the scan method.但是我找不到合适的正则表达式来输入扫描方法。 I tried this:
s.scan(/(.)\\1++/x)
but it only captures the first character of the repeating sequences.我试过这个:
s.scan(/(.)\\1++/x)
但它只捕获重复序列的第一个字符。 Is it possible at all?有可能吗?
This is a bit tricky.这有点棘手。
You do want to capture any group that is more than one of any given character.您确实希望捕获超过任何给定字符之一的任何组。 So a good way to do this is using backreferences.
所以这样做的一个好方法是使用反向引用。 Your solution is close to being correct.
您的解决方案接近正确。
/((.)\\2+)/
should do the trick. /((.)\\2+)/
应该可以解决问题。
Note that if you use scan, this will return two values for each match group.请注意,如果您使用扫描,这将为每个匹配组返回两个值。 The first being the sequence, and the second being the value.
第一个是序列,第二个是值。
str = "++++001------zx.......?????????xxxxxxx"
str.chars.chunk{|e| e}.map{|e| e[1].join if e[1].size >1 }.compact
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]
In case you need to get overall match values only while ignoring (omitting) all capturing group values, similarly to how String#match
works in JavaScript, you can use a String#gsub with a single regex argument (no replacement argument) to return an Enumerator , with .to_a
to get the array of matches:如果您只需要在忽略(省略)所有捕获组值时获取整体匹配值,类似于
String#match
在 JavaScript String#match
工作方式,您可以使用带有单个正则表达式参数(无替换参数)的String#gsub返回一个Enumerator ,使用.to_a
获取匹配数组:
text = "++++001------zx.......?????????xxxxxxx"
p text.gsub(/(.)\1+/m).to_a
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]
See the Ruby demo online and the Rubular demo (note how the matches are highlighted in the Match result field).查看在线 Ruby 演示和Rubular 演示(注意匹配结果字段中的匹配项是如何突出显示的)。
I added m
modifier just for completeness, for the .
我添加了
m
修饰符只是为了完整性,对于.
to also match line break chars that a .
还匹配 a 的换行符字符
.
does not match by default.默认不匹配。
Also, see a related Capturing groups don't work as expected with Ruby scan method thread.此外,请参阅相关的捕获组在使用 Ruby 扫描方法线程时无法按预期工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.