How to push a string into a new array in Ruby

Question

I want to search for substrings in a given string. Each time the substring is included within the entered string I append it to an array. Ultimately I want to tally that array to get a count of how many times each substring appears.

The problem is that the substring from the dictionary in my code is only added once to new_array .

For example:

dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

substrings("go going", dictionary)

Should output:

{"go"=>2, "going"=>1, "i"=>1}

but I get

{"go"=>1, "going"=>1, "i"=>1}

This is my code:

def substrings(word, array) 

  new_array = []

  array.each do |index| 

    if word.downcase.include? (index)

      new_array << index

    end
  end

  puts new_array.tally

end

 dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

 substrings("go going", dictionary)

Answer 1

Depending on how large your dictionary is.

You can just map all the elements with their occurrence count when the substring exists in the word.

dictionary.map {|w| [w,word.scan(w).size] if word.include?(w)}.compact.to_h

Answer 2

You can use scan to count of how many times each substring appears.

def substrings(word, array)
  output = {}
  array.each do |index|
     count_substring_appears = word.scan(index).size
     if count_substring_appears > 0
       output[index] = count_substring_appears
     end
  end

  output
end

Answer 3

Only the words "go", "going" and "i" from your dictionary are substrings of your phrase. Each of these words occurs only once in the dictionary. So new_array contains ["go", "going", "i"] which exactly {"go"=>1, "going"=>1, "i"=>1} .

I assume you expected go to be twice because it is twice in your phrase. In this case you can change your method to

def substrings(word, array) 
  new_array = []
  array.each do |index| 
    word.scan(/#{index}/).each { new_array << index }
  end
  puts new_array.tally
end

word.scan(/#{index}/) returns each occurrence of substring in your phrase.

Answer 4

You must count the number of times a string appears in index, so use scan :

def substrings(word, array) 

  hash = {}

  array.each do |index| 
    if word.downcase.include? (index)
      new_hash = {index => word.scan(/#{index}/).length}; 
      hash.merge!(new_hash) 
    end
  end

  puts hash 

end

Answer 5

If it my understanding that we are given an array dictionary of words containing no whitespace, and a string str , and are to construct a hash whose keys are elements of dictionary and whose values equal the number of non-overlapping ¹ substrings of str for which the key is a substring. The hash returned should exclude keys having values of zero.

This answer addresses the situation where, in:

substrings(str, dictionary)

dictionary is large, str is not overly-large (the meaning of which I elaborate later) and efficiency is important.

We begin by defining a helper method, whose purpose will become clear.

def substr_counts(str)
  str.split.each_with_object(Hash.new(0)) do |word,h|
    (1..word.size).each do |sub_len|
      (0..word.size-sub_len).each do |start_idx|
        h[word[start_idx,sub_len]] += 1
      end
    end
  end
end

For the example given in the question,

substr_counts("go going")
  #=> {"g"=>3, "o"=>2, "go"=>2, "i"=>1, "n"=>1, "oi"=>1, "in"=>1, "ng"=>1,
  #    "goi"=>1, "oin"=>1, "ing"=>1, "goin"=>1, "oing"=>1, "going"=>1}

As seen, this method breaks str into words, computes each substring of each word and returns a hash whose keys are the substrings and whose values are the total numbers of non-overlapping substrings in all words that contain that substring.

The desired hash can now be constructed quickly.

def cover_count(str, dictionary)
  h = substr_counts(str)
  dictionary.each_with_object({}) do |word,g|
    g[word] = h[word] if h.key?(word)
  end
end

dictionary = ["below", "down", "go", "going", "horn", "how", "howdy", 
              "it", "i", "low", "own", "part", "partner", "sit"]

cover_count("go going", dictionary)
  #=> {"go"=>2, "going"=>1, "i"=>1}

Another example:

str = "lowner partnership lownliest"
cover_count(str, dictionary)
  #=> {"i"=>2, "low"=>2, "own"=>2, "part"=>1, "partner"=>1}

Here,

substr_counts(str)
  #=> {"l"=>3, "o"=>2, "w"=>2, "n"=>3, "e"=>3, "r"=>3, "lo"=>2,
  #    ...
  #    "wnliest"=>1, "lownlies"=>1, "ownliest"=>1, "lownliest"=>1} 
substr_counts(str).size
  #=> 109

There is an obvious trade-off here. If str is long, and especially if it contains long words ² , it will take too long to build h to justify the savings of not having to determine, for each word in dictionary , if that word is contained in each word of str . If, however, it's worth it to construct h , the overall time savings could be substantial.

^{1. By "non-overlapping" I mean that if str equals 'bobobo' it contains one, not two, substrings 'bobo' .}

2. substr_counts("antidisestablishmentarianism").size #=> 385 , not so bad.

Answer 6

Other option is to use Array#product after splitting of the word, so you can use Enumerable#Tally as you want:

word = "go going"
word.split.product(dictionary).select { |a, b| a.include? b }.map(&:last).tally

#=> {"go"=>2, "going"=>1, "i"=>1}

It doesn't output the same when word = "gogoing" , since it is split in one element array. So, I can't say if this is the behaviour you are looking for.

Answer 7

I'd start with this:

dictionary = %w[down go going it i]
target = 'go going'

dictionary.flat_map { |w|
  target.scan(Regexp.new(w, Regexp::IGNORECASE))
}.reject(&:empty?).tally
# => {"go"=>2, "going"=>1, "i"=>1}

How to push a string into a new array in Ruby

Question

7 answers

solution1
1 2020-06-17 17:01:40

solution2
0 2020-06-17 16:50:44

solution3
0 2020-06-17 16:51:43

solution4
0 2020-06-17 16:59:04

solution5
0 2020-06-17 20:10:43

solution6
0 2020-06-17 20:18:52

solution7
0 2020-06-18 00:18:16

How to push a string into a new array in Ruby

Question

7 answers

solution1 1 2020-06-17 17:01:40

solution2 0 2020-06-17 16:50:44

solution3 0 2020-06-17 16:51:43

solution4 0 2020-06-17 16:59:04

solution5 0 2020-06-17 20:10:43

solution6 0 2020-06-17 20:18:52

solution7 0 2020-06-18 00:18:16

solution1
1 2020-06-17 17:01:40

solution2
0 2020-06-17 16:50:44

solution3
0 2020-06-17 16:51:43

solution4
0 2020-06-17 16:59:04

solution5
0 2020-06-17 20:10:43

solution6
0 2020-06-17 20:18:52

solution7
0 2020-06-18 00:18:16