I want to search for substrings in a given string. Each time the substring is included within the entered string I append it to an array. Ultimately I want to tally
that array to get a count of how many times each substring appears.
The problem is that the substring from the dictionary in my code is only added once to new_array
.
For example:
dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]
substrings("go going", dictionary)
Should output:
{"go"=>2, "going"=>1, "i"=>1}
but I get
{"go"=>1, "going"=>1, "i"=>1}
This is my code:
def substrings(word, array)
new_array = []
array.each do |index|
if word.downcase.include? (index)
new_array << index
end
end
puts new_array.tally
end
dictionary = ["below", "down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]
substrings("go going", dictionary)
Depending on how large your dictionary is.
You can just map all the elements with their occurrence count when the substring exists in the word.
dictionary.map {|w| [w,word.scan(w).size] if word.include?(w)}.compact.to_h
You can use scan to count of how many times each substring appears.
def substrings(word, array)
output = {}
array.each do |index|
count_substring_appears = word.scan(index).size
if count_substring_appears > 0
output[index] = count_substring_appears
end
end
output
end
Only the words "go", "going" and "i" from your dictionary are substrings of your phrase. Each of these words occurs only once in the dictionary. So new_array
contains ["go", "going", "i"]
which exactly {"go"=>1, "going"=>1, "i"=>1}
.
I assume you expected go
to be twice because it is twice in your phrase. In this case you can change your method to
def substrings(word, array)
new_array = []
array.each do |index|
word.scan(/#{index}/).each { new_array << index }
end
puts new_array.tally
end
word.scan(/#{index}/)
returns each occurrence of substring in your phrase.
You must count the number of times a string appears in index, so use scan
:
def substrings(word, array)
hash = {}
array.each do |index|
if word.downcase.include? (index)
new_hash = {index => word.scan(/#{index}/).length};
hash.merge!(new_hash)
end
end
puts hash
end
If it my understanding that we are given an array dictionary
of words containing no whitespace, and a string str
, and are to construct a hash whose keys are elements of dictionary
and whose values equal the number of non-overlapping 1 substrings of str
for which the key is a substring. The hash returned should exclude keys having values of zero.
This answer addresses the situation where, in:
substrings(str, dictionary)
dictionary
is large, str
is not overly-large (the meaning of which I elaborate later) and efficiency is important.
We begin by defining a helper method, whose purpose will become clear.
def substr_counts(str)
str.split.each_with_object(Hash.new(0)) do |word,h|
(1..word.size).each do |sub_len|
(0..word.size-sub_len).each do |start_idx|
h[word[start_idx,sub_len]] += 1
end
end
end
end
For the example given in the question,
substr_counts("go going")
#=> {"g"=>3, "o"=>2, "go"=>2, "i"=>1, "n"=>1, "oi"=>1, "in"=>1, "ng"=>1,
# "goi"=>1, "oin"=>1, "ing"=>1, "goin"=>1, "oing"=>1, "going"=>1}
As seen, this method breaks str
into words, computes each substring of each word and returns a hash whose keys are the substrings and whose values are the total numbers of non-overlapping substrings in all words that contain that substring.
The desired hash can now be constructed quickly.
def cover_count(str, dictionary)
h = substr_counts(str)
dictionary.each_with_object({}) do |word,g|
g[word] = h[word] if h.key?(word)
end
end
dictionary = ["below", "down", "go", "going", "horn", "how", "howdy",
"it", "i", "low", "own", "part", "partner", "sit"]
cover_count("go going", dictionary)
#=> {"go"=>2, "going"=>1, "i"=>1}
Another example:
str = "lowner partnership lownliest"
cover_count(str, dictionary)
#=> {"i"=>2, "low"=>2, "own"=>2, "part"=>1, "partner"=>1}
Here,
substr_counts(str)
#=> {"l"=>3, "o"=>2, "w"=>2, "n"=>3, "e"=>3, "r"=>3, "lo"=>2,
# ...
# "wnliest"=>1, "lownlies"=>1, "ownliest"=>1, "lownliest"=>1}
substr_counts(str).size
#=> 109
There is an obvious trade-off here. If str
is long, and especially if it contains long words 2 , it will take too long to build h
to justify the savings of not having to determine, for each word in dictionary
, if that word is contained in each word of str
. If, however, it's worth it to construct h
, the overall time savings could be substantial.
1. By "non-overlapping" I mean that if str
equals 'bobobo'
it contains one, not two, substrings 'bobo'
.
2. substr_counts("antidisestablishmentarianism").size #=> 385
, not so bad.
Other option is to use Array#product after splitting of the word, so you can use Enumerable#Tally as you want:
word = "go going"
word.split.product(dictionary).select { |a, b| a.include? b }.map(&:last).tally
#=> {"go"=>2, "going"=>1, "i"=>1}
It doesn't output the same when word = "gogoing"
, since it is split in one element array. So, I can't say if this is the behaviour you are looking for.
I'd start with this:
dictionary = %w[down go going it i]
target = 'go going'
dictionary.flat_map { |w|
target.scan(Regexp.new(w, Regexp::IGNORECASE))
}.reject(&:empty?).tally
# => {"go"=>2, "going"=>1, "i"=>1}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.