简体   繁体   English

获取所有可能的子字符串及其计数

[英]Get all possible substrings and their count

I'm trying to get all possible substrings and their count in a hash. 我正在尝试获取所有可能的子字符串及其哈希值。 Eg 例如

 "abc" => { a: 1, b: 1, ab: 1, bc: 1}

For that I wrote the following code: 为此,我编写了以下代码:

 def get_all(b)
     (0..(b.size-1)).to_a.combination(2).inject({}) { |h, g|
        s = b[g[0],g[1]]
        h[s] ? ( h[s] += 1) : ( h[s] = 1 )
        h 
      } 
 end

But somehow It does not work correctly, because for "abchh" It returns: 但是由于某种原因,它无法正常工作,因为对于"abchh"它返回:

{"a"=>1, "ab"=>1, "abc"=>1, "abch"=>1, "bc"=>1, "bch"=>1, "bchh"=>1, "chh"=>2, "hh"=>1}

chh is in there twice , but I can't understand why. chh在那里两次 ,但我不明白为什么。 What do I wrong? 我怎么了

Thank you! 谢谢!

String#[] can be called in various ways, including: String#[]可以通过多种方式调用,包括:

 str[start, length] → new_str or nil str[range] → new_str or nil 

The former expects start and length , whereas the latter expects a range denoting start and end . 前者期望起点长度 ,而后者期望指示起点终点的范围。

So instead of two arguments g[0] and g[1] : 因此,代替了两个参数g[0]g[1]

b[g[0], g[1]]

you have to pass a single argument g[0]..g[1] : 您必须传递一个参数g[0]..g[1]

b[g[0]..g[1]]

Besides, you have to use repeated_combination in order to get the single characters as well: 此外,您还必须使用repeated_combination才能获得单个字符:

(0..2).to_a.combination(2).to_a
#=> [[0, 1], [0, 2], [1, 2]]

(0..2).to_a.repeated_combination(2).to_a
#=> [[0, 0], [0, 1], [0, 2], [1, 1], [1, 2], [2, 2]]

Furthermore, your code can be simplified: 此外,您的代码可以简化:

  • use a...b instead of a..(b-1) 使用a...b代替a..(b-1)
  • prefer each_with_object over inject so you don't have to return the hash from the block each_with_object对于inject ,更喜欢each_with_object ,因此您不必从块中返回哈希值
  • set a default hash value via Hash.new(0) 通过Hash.new(0)设置默认的哈希值
  • decompose the tuple array via (i, j) to have i..j instead of g[0]..g[1] 通过(i, j)分解元组数组(i, j)使其具有i..j而不是g[0]..g[1]

Example: (the indices variable can be inlined) 示例:(可以内联indices变量)

def get_all(str)
  indices = (0...str.size).to_a.repeated_combination(2)
  indices.each_with_object(Hash.new(0)) do |(i, j), h|
    h[str[i..j]] += 1
  end
end

Or, using two nested loops: 或者,使用两个嵌套循环:

def get_all(str)
  (0...str.size).each_with_object(Hash.new(0)) do |i, h|
    (i...str.size).each do |j|
      h[str[i..j]] += 1
    end
  end
end

Maybe the method is already doing too much. 也许该方法已经做得太多了。 I'd probably split it into two methods: one for enumerating the substrings and another one for counting them. 我可能会将其分为两种方法:一种用于枚举子字符串,另一种用于计数子字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何以一切可能的方式将字符串拆分为长度最多为3的连续子串? - How to split a string into consecutive substrings of length at most 3 in all possible ways? 拆分字符串以获取Ruby的所有子字符串的最佳方法是什么? - What is the best way to split a string to get all the substrings by Ruby? 正则表达式 - 分隔符之间的所有子串 - Regex - All the substrings between delimiters 如何获得尽可能少的石墨图计数 - how to get least possible count on graphite graph 是否有可能在Ruby中获得所有的本征类? - Is it possible to get all the eigenclasses in Ruby? 找出所有可能的、长度为 k 最大的连续子串。 (答案已经在 ruby​​ 中找到,需要一个 javascript 实现) - Find all possible consecutive substrings with k-most length. (Answer already found in ruby, need a javascript implementation) 计算 ruby 中子字符串列表出现次数的最快方法 - Fastest way to count occurences of a substrings list in ruby 将字符串拆分为n个子字符串的可能方法 - possible ways of splitting a string into n substrings 获取 Rails 中模型的所有关联数 - Get count of all associations to a model in rails Ruby删除所有以特定字符开头的子字符串 - Ruby remove all substrings that begin with specific character
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM