简体   繁体   English

返回字符串中所有出现的字符在ruby中的索引

[英]Return index of all occurrences of a character in a string in ruby

I am trying to return the index's to all occurrences of a specific character in a string using Ruby. 我试图使用Ruby将索引返回到字符串中特定字符的所有匹配项。 A example string is "a#asg#sdfg#d##" and the expected return is [1,5,10,12,13] when searching for # characters. 搜索#字符时,示例字符串为"a#asg#sdfg#d##" ,预期收益为[1,5,10,12,13] The following code does the job but there must be a simpler way of doing this? 以下代码可以完成工作,但是必须有更简单的方法吗?

def occurances (line)

  index = 0
  all_index = []

  line.each_byte do |x|
    if x == '#'[0] then
      all_index << index
    end
    index += 1
  end

  all_index
end
s = "a#asg#sdfg#d##"
a = (0 ... s.length).find_all { |i| s[i,1] == '#' }
require 'enumerator' # Needed in 1.8.6 only
"1#3#a#".enum_for(:scan,/#/).map { Regexp.last_match.begin(0) }
#=> [1, 3, 5]

ETA: This works by creating an Enumerator that uses scan(/#/) as its each method. ETA:通过创建一个枚举器来工作,该枚举器将scan(/#/)用作其每个方法。

scan yields each occurence of the specified pattern (in this case /#/ ) and inside the block you can call Regexp.last_match to access the MatchData object for the match. 扫描产生指定模式(在这种情况下为/#/ )的每次出现,并且在块内可以调用Regexp.last_match来访问Match的MatchData对象。

MatchData#begin(0) returns the index where the match begins and since we used map on the enumerator, we get an array of those indices back. MatchData#begin(0)返回匹配开始的索引,并且由于我们在枚举器上使用了map,因此我们返回了这些索引的数组。

Here's a less-fancy way: 这是一种花哨的方法:

i = -1
all = []
while i = x.index('#',i+1)
  all << i
end
all

In a quick speed test this was about 3.3x faster than FM's find_all method, and about 2.5x faster than sepp2k's enum_for method. 在快速速度测试中,它比FM的find_all方法快约3.3倍,比sepp2k的enum_for方法快约2.5倍。

Here's a long method chain: 这是一个很长的方法链:

"a#asg#sdfg#d##".
  each_char.
  each_with_index.
  inject([]) do |indices, (char, idx)|
    indices << idx if char == "#"
    indices
  end

# => [1, 5, 10, 12, 13]

requires 1.8.7+ 需要1.8.7+

Another solution derived from FMc's answer: 来自FMc的答案的另一种解决方案:

s = "a#asg#sdfg#d##"
q = []
s.length.times {|i| q << i if s[i,1] == '#'}

I love that Ruby never has only one way of doing something! 我喜欢Ruby永远只有一种做事的方法!

Here's a solution for massive strings. 这是海量字符串的解决方案。 I'm doing text finds on 4.5MB text strings and the other solutions grind to a halt. 我正在4.5MB文本字符串上进行文本查找,而其他解决方案则停止了。 This takes advantage of the fact that ruby .split is very efficient compared to string comparisions. 这利用了ruby .split与字符串比较相比非常有效的事实。

def indices_of_matches(str, target)
      cuts = (str + (target.hash.to_s.gsub(target,''))).split(target)[0..-2]
      indicies = []
      loc = 0
      cuts.each do |cut|
        loc = loc + cut.size
        indicies << loc
        loc = loc + target.size
      end
      return indicies
    end

It's basically using the horsepower behind the .split method, then using the separate parts and the length of the searched string to work out locations. 它基本上是利用.split方法背后的功能,然后使用分开的部分和所搜索字符串的长度来确定位置。 I've gone from 30 seconds using various methods to instantaneous on extremely large strings. 我已经从使用各种方法的30秒变为瞬时处理非常大的字符串。

I'm sure there's a better way to do it, but: 我敢肯定,有更好的方法可以做到这一点,但是:

(str + (target.hash.to_s.gsub(target,'')))

adds something to the end of the string in case the target is at the end (and the way split works), but have to also make sure that the "random" addition doesn't contain the target itself. 如果目标位于末尾(以及拆分的工作方式),则在字符串末尾添加一些内容,但还必须确保“随机”添加项不包含目标本身。

indices_of_matches("a#asg#sdfg#d##","#")
=> [1, 5, 10, 12, 13]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM