简体   繁体   中英

How to push a string into a new array in Ruby

I want to search for substrings in a given string. Each time the substring is included within the entered string I append it to an array. Ultimately I want to tally that array to get a count of how many times each substring appears.

The problem is that the substring from the dictionary in my code is only added once to new_array .

For example:

dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

substrings("go going", dictionary)

Should output:

{"go"=>2, "going"=>1, "i"=>1}

but I get

{"go"=>1, "going"=>1, "i"=>1}

This is my code:

def substrings(word, array) 

  new_array = []

  array.each do |index| 

    if word.downcase.include? (index)

      new_array << index

    end
  end

  puts new_array.tally

end

 dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

 substrings("go going", dictionary)

Depending on how large your dictionary is.

You can just map all the elements with their occurrence count when the substring exists in the word.

dictionary.map {|w| [w,word.scan(w).size] if word.include?(w)}.compact.to_h

You can use scan to count of how many times each substring appears.

def substrings(word, array)
  output = {}
  array.each do |index|
     count_substring_appears = word.scan(index).size
     if count_substring_appears > 0
       output[index] = count_substring_appears
     end
  end

  output
end

Only the words "go", "going" and "i" from your dictionary are substrings of your phrase. Each of these words occurs only once in the dictionary. So new_array contains ["go", "going", "i"] which exactly {"go"=>1, "going"=>1, "i"=>1} .

I assume you expected go to be twice because it is twice in your phrase. In this case you can change your method to

def substrings(word, array) 
  new_array = []
  array.each do |index| 
    word.scan(/#{index}/).each { new_array << index }
  end
  puts new_array.tally
end

word.scan(/#{index}/) returns each occurrence of substring in your phrase.

You must count the number of times a string appears in index, so use scan :

def substrings(word, array) 

  hash = {}

  array.each do |index| 
    if word.downcase.include? (index)
      new_hash = {index => word.scan(/#{index}/).length}; 
      hash.merge!(new_hash) 
    end
  end

  puts hash 

end

If it my understanding that we are given an array dictionary of words containing no whitespace, and a string str , and are to construct a hash whose keys are elements of dictionary and whose values equal the number of non-overlapping 1 substrings of str for which the key is a substring. The hash returned should exclude keys having values of zero.

This answer addresses the situation where, in:

substrings(str, dictionary)

dictionary is large, str is not overly-large (the meaning of which I elaborate later) and efficiency is important.

We begin by defining a helper method, whose purpose will become clear.

def substr_counts(str)
  str.split.each_with_object(Hash.new(0)) do |word,h|
    (1..word.size).each do |sub_len|
      (0..word.size-sub_len).each do |start_idx|
        h[word[start_idx,sub_len]] += 1
      end
    end
  end
end       

For the example given in the question,

substr_counts("go going")
  #=> {"g"=>3, "o"=>2, "go"=>2, "i"=>1, "n"=>1, "oi"=>1, "in"=>1, "ng"=>1,
  #    "goi"=>1, "oin"=>1, "ing"=>1, "goin"=>1, "oing"=>1, "going"=>1}

As seen, this method breaks str into words, computes each substring of each word and returns a hash whose keys are the substrings and whose values are the total numbers of non-overlapping substrings in all words that contain that substring.

The desired hash can now be constructed quickly.

def cover_count(str, dictionary)
  h = substr_counts(str)
  dictionary.each_with_object({}) do |word,g|
    g[word] = h[word] if h.key?(word)
  end
end

dictionary = ["below", "down", "go", "going", "horn", "how", "howdy", 
              "it", "i", "low", "own", "part", "partner", "sit"]

cover_count("go going", dictionary)
  #=> {"go"=>2, "going"=>1, "i"=>1}

Another example:

str = "lowner partnership lownliest"
cover_count(str, dictionary)
  #=> {"i"=>2, "low"=>2, "own"=>2, "part"=>1, "partner"=>1}     

Here,

substr_counts(str)
  #=> {"l"=>3, "o"=>2, "w"=>2, "n"=>3, "e"=>3, "r"=>3, "lo"=>2,
  #    ...
  #    "wnliest"=>1, "lownlies"=>1, "ownliest"=>1, "lownliest"=>1} 
substr_counts(str).size
  #=> 109

There is an obvious trade-off here. If str is long, and especially if it contains long words 2 , it will take too long to build h to justify the savings of not having to determine, for each word in dictionary , if that word is contained in each word of str . If, however, it's worth it to construct h , the overall time savings could be substantial.

1. By "non-overlapping" I mean that if str equals 'bobobo' it contains one, not two, substrings 'bobo' .

2. substr_counts("antidisestablishmentarianism").size #=> 385 , not so bad.

Other option is to use Array#product after splitting of the word, so you can use Enumerable#Tally as you want:

word = "go going"
word.split.product(dictionary).select { |a, b| a.include? b }.map(&:last).tally

#=> {"go"=>2, "going"=>1, "i"=>1}

It doesn't output the same when word = "gogoing" , since it is split in one element array. So, I can't say if this is the behaviour you are looking for.

I'd start with this:

dictionary = %w[down go going it i]
target = 'go going'

dictionary.flat_map { |w|
  target.scan(Regexp.new(w, Regexp::IGNORECASE))
}.reject(&:empty?).tally
# => {"go"=>2, "going"=>1, "i"=>1}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM