簡體   English   中英

Ruby:計算數組中的記錄

[英]Ruby: counting records in an array

我有兩個文件:(1)第一個文件包含系統的用戶,我將該文件讀入數組(2)第二個文件包含有關這些用戶的統計信息

我的任務是讓用戶計數,例如

{"user1" => 1, "user2" => 0, "user3" => 4}

這就是我解決問題的方式

# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value

stats.each do |item| # basic loop to count the records
   hash[item] += 1 if hash.has_key?(item)
end

puts hash

# terminal outcome
# $ ruby example.rb 
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}

我只是想知道是否有比循環計數更好的方法,特別是因為Ruby具有神奇的力量並且我來自C背景

基本上,除了一些小問題外,您的代碼是最快可以為此運行的代碼。

如果您有不需要的條目標記數組的結尾

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

我認為您應該在運行之前將其pop ,因為它可能導致產生奇怪的輸入,並且它的存在迫使您在循環中使用條件測試,從而降低了代碼速度。

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]

存在內置方法,這些方法可減少一次調用的代碼量,但比循環慢:

stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}

從那里很容易將生成的哈希map到摘要中:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]

然后再次變成哈希:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}

要么:

Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}

使用內置方法非常有效,並且在處理非常大的列表時非常有用,因為正在進行的冗余循環很少。

像您那樣循環遍歷數據也非常快,並且如果正確編寫,通常比內置方法快。 以下是一些基准,這些基准顯示了完成此任務的替代方法:

require 'fruity'  # => true

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']

Hash[names.map {|v| [v, 0]}]                    # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )]              # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h               # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash  # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}

compare do 
  map_hash { Hash[names.map {|v| [v, 0]}] }
  zip_hash { Hash[names.zip([0] * names.size )] }
  to_h_hash { names.zip([0] * names.size ).to_h }
  hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%

查看循環中的條件以查看其如何影響代碼:

require 'fruity'  # => true

NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]

def build_hash
  h = {}
  NAMES.each{ |k| h[k] = 0 }
  h
end

compare do 

  your_way {
    hash = build_hash()
    STATS.each do |item| # basic loop to count the records
      hash[item] += 1 if hash.has_key?(item)
    end
    hash
  }

  my_way {
    hash = build_hash()
    STATS2.each { |e| hash[e] += 1 }
    hash
  }
end

# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%

盡管建議使用count回答幾個問題,但隨着列表大小的增加,代碼將減慢很多速度,在這種情況下,一次遍歷stats數組將始終是線性的,因此請使用以下迭代解決方案之一。

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

stats.each_with_object(Hash.new(0)) { |user,hash| 
  hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash.new

names.each { |name| hash[name] = stats.count(name) }

puts hash

您可以使用map和Hash []。

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM