計算 ruby 中子字符串列表出現次數的最快方法

Question

我的問題很簡單，我有一個子字符串列表，我必須計算特定字符串中包含多少子字符串。 這是我的代碼：

string = "..."
substrings = ["hello", "foo", "bar", "brol"]
count = 0
substrings.each do |sub|
    count += 1 if string.include?(sub)
end

在這個例子中，我們遍歷了整個字符串 4 次，這非常消耗。 你會如何優化這個過程？

Answer 1

這使用Regexp.union只運行一次字符串：

string = 'hello there! this is foobar!'
substrings = ["hello", "foo", "bar", "brol"]

string.scan(Regexp.union(substrings)).count
# => 3

盡管這種解決方案在輸入較小時明顯較慢，但它的復雜度較低 - 對於長度為n字符串和長度為m的子串，原始解決方案的復雜度為O(m*n) ，而此解決方案的復雜度為O(m+n) 。

更新
在再次閱讀問題和我的回答之后，我得出的結論是，不僅這是一個不成熟的優化（正如@Max所說），但我的答案在語義上與OP 不同。

讓我解釋一下 - OP代碼計算有多少substrings 在 substrings 中至少有一個外觀，而我的解決方案計算任何子 substrings 外觀數量：

op_solution('hello hello there', ["hello", "foo", "bar", "brol"])
# => 1
uri_solution('hello hello there', ["hello", "foo", "bar", "brol"])
# => 2

這也解釋了為什么我的解決方案如此緩慢，即使對於長字符串 - 雖然它只有一個輸入字符串傳遞，它必須傳遞所有它，而原始代碼在第一次出現時停止。

我的結論是 - 使用@Arup的解決方案。 它不會比你的快，它只是更簡潔，但我想不出更好的:)

Answer 2

寫作： -

substrings.count { |sub| string.include?(sub) }

Answer 3

subtrings.collect { |i| string.scan(i).count }.sum

優雅的。