简体   繁体   English

Ruby Split String使用正则表达式区分字符

[英]Ruby Split string at character difference using regex

I'm current working on a problem that involves splitting a string by each group of characters. 我目前正在解决一个问题,该问题涉及按每组字符分割一个字符串。

For example, 例如,

"111223334456777" #=> ['111','22','333','44','5','6','777']

The way I am currently doing it now is using a enumerator and comparing each character with the next one, and splitting the array that way. 我目前的操作方式是使用枚举器,将每个字符与下一个字符进行比较,然后以这种方式拆分数组。

res = []
str = "111223334456777"
group = str[0]
(1...str.length).each do |i|
  if str[i] != str[i-1]
    res << group
    group = str[i]
  else
    group << str[i]
  end
end
res << group
res #=> ['111','22','333','44','5','6','777']

I want to see if I can use regex to do this, which will make this process a lot easier. 我想看看是否可以使用正则表达式来执行此操作,这将使此过程更加容易。 I understand I could just put this block of code in a method, but I'm curious if regex can be used here. 我知道我可以将这段代码放在一个方法中,但是我很好奇是否可以在这里使用正则表达式。

So what I want to do is 所以我想做的是

str.split(/some regex/)

to produce the same result. 产生相同的结果。 I thought about positive lookahead, but I can't figure out how to have regex recognize that the character is different. 我想到了积极的前瞻性,但是我不知道如何让正则表达式认识到字符是不同的。

Does anyone have an idea if this is possible? 有谁知道这是否可行?

str = "111333224456777"

str.scan /0+|1+|2+|3+|4+|5+|6+|7+|8+|9+/
  #=> ["111", "333", "22", "44", "5", "6", "777"]

or 要么

str.scan(/((\d)\2*)/).map(&:first)
  #=> ["111", "333", "22", "44", "5", "6", "777"] 

Readers: can the latter be simplified? 读者:可以简化后者吗?

The chunk_while method is what you're looking for here: chunk_while方法就是您在这里寻找的:

str.chars.chunk_while { |b,a| b == a }.map(&:join)

That will break anything where the current character a doesn't match the previous character b . 这将破坏当前字符a与先前字符b不匹配的所有内容。 If you want to restrict to just numbers you can do some pre-processing. 如果您只想限制数字,则可以进行一些预处理。

There's a lot of very handy methods in Enumerable that are worth exploring, and each new version of Ruby seems to add more of them. Enumerable中有很多非常方便的方法值得探索,并且每个新版本的Ruby似乎都添加了更多方法。

Another option which utilises the group_by method, which returns a hash with each individual number as a key and an array of grouped numbers as the value. 另一个利用group_by方法的选项,该方法返回一个散列,其中每个单独的数字作为键,而一个分组数字的数组作为值。

"111223334456777".split('').group_by { |i| i }.values.map(&:join) => => ["111", "22", "333", "44", "5", "6", "777"]

Although it doesn't implement a regex, someone else may find it useful. 尽管它没有实现正则表达式,但其他人可能会发现它很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM