简体   繁体   中英

Return index of all occurrences of a character in a string in ruby

I am trying to return the index's to all occurrences of a specific character in a string using Ruby. A example string is "a#asg#sdfg#d##" and the expected return is [1,5,10,12,13] when searching for # characters. The following code does the job but there must be a simpler way of doing this?

def occurances (line)

  index = 0
  all_index = []

  line.each_byte do |x|
    if x == '#'[0] then
      all_index << index
    end
    index += 1
  end

  all_index
end
s = "a#asg#sdfg#d##"
a = (0 ... s.length).find_all { |i| s[i,1] == '#' }
require 'enumerator' # Needed in 1.8.6 only
"1#3#a#".enum_for(:scan,/#/).map { Regexp.last_match.begin(0) }
#=> [1, 3, 5]

ETA: This works by creating an Enumerator that uses scan(/#/) as its each method.

scan yields each occurence of the specified pattern (in this case /#/ ) and inside the block you can call Regexp.last_match to access the MatchData object for the match.

MatchData#begin(0) returns the index where the match begins and since we used map on the enumerator, we get an array of those indices back.

Here's a less-fancy way:

i = -1
all = []
while i = x.index('#',i+1)
  all << i
end
all

In a quick speed test this was about 3.3x faster than FM's find_all method, and about 2.5x faster than sepp2k's enum_for method.

Here's a long method chain:

"a#asg#sdfg#d##".
  each_char.
  each_with_index.
  inject([]) do |indices, (char, idx)|
    indices << idx if char == "#"
    indices
  end

# => [1, 5, 10, 12, 13]

requires 1.8.7+

Another solution derived from FMc's answer:

s = "a#asg#sdfg#d##"
q = []
s.length.times {|i| q << i if s[i,1] == '#'}

I love that Ruby never has only one way of doing something!

Here's a solution for massive strings. I'm doing text finds on 4.5MB text strings and the other solutions grind to a halt. This takes advantage of the fact that ruby .split is very efficient compared to string comparisions.

def indices_of_matches(str, target)
      cuts = (str + (target.hash.to_s.gsub(target,''))).split(target)[0..-2]
      indicies = []
      loc = 0
      cuts.each do |cut|
        loc = loc + cut.size
        indicies << loc
        loc = loc + target.size
      end
      return indicies
    end

It's basically using the horsepower behind the .split method, then using the separate parts and the length of the searched string to work out locations. I've gone from 30 seconds using various methods to instantaneous on extremely large strings.

I'm sure there's a better way to do it, but:

(str + (target.hash.to_s.gsub(target,'')))

adds something to the end of the string in case the target is at the end (and the way split works), but have to also make sure that the "random" addition doesn't contain the target itself.

indices_of_matches("a#asg#sdfg#d##","#")
=> [1, 5, 10, 12, 13]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM