简体   繁体   English

Ruby:计算数组中的记录

[英]Ruby: counting records in an array

I have two files: (1) the first file contains the users of the system, I read that file into an array (2) the second file contains statistics about those users 我有两个文件:(1)第一个文件包含系统的用户,我将该文件读入数组(2)第二个文件包含有关这些用户的统计信息

My task is to have a user count, something like 我的任务是让用户计数,例如

{"user1" => 1, "user2" => 0, "user3" => 4}

This how I solved the problem 这就是我解决问题的方式

# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value

stats.each do |item| # basic loop to count the records
   hash[item] += 1 if hash.has_key?(item)
end

puts hash

# terminal outcome
# $ ruby example.rb 
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}

I'm only curious if there is a better way than counting in a loop, specially since Ruby comes with magical powers and I come from a C background 我只是想知道是否有比循环计数更好的方法,特别是因为Ruby具有神奇的力量并且我来自C背景

Basically your code is the fastest you can get code to run for this, except for some minor issues. 基本上,除了一些小问题外,您的代码是最快可以为此运行的代码。

If you have an unneeded entry marking the end of the array 如果您有不需要的条目标记数组的结尾

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

I think you should pop it off prior to running, since it has the possibility of resulting in a weird entry, and its existence forces you to use the conditional test in your loop, slowing your code. 我认为您应该在运行之前将其pop ,因为它可能导致产生奇怪的输入,并且它的存在迫使您在循环中使用条件测试,从而降低了代码速度。

stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]

Built-in methods exist, which reduce the amount of code to a single call, but they're slower than a loop: 存在内置方法,这些方法可减少一次调用的代码量,但比循环慢:

stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}

From there it's easy to map the resulting hash into summaries: 从那里很容易将生成的哈希map到摘要中:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]

And then into a hash again: 然后再次变成哈希:

stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}

or: 要么:

Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}

Using the built-in methods are efficient, and very useful when you're dealing with very large lists, because there's very little redundant looping going on. 使用内置方法非常有效,并且在处理非常大的列表时非常有用,因为正在进行的冗余循环很少。

Looping over the data like you did is also very fast, and usually faster than the built-in methods, when written correctly. 像您那样循环遍历数据也非常快,并且如果正确编写,通常比内置方法快。 Here are some benchmarks showing alternate ways of accomplishing this stuff: 以下是一些基准,这些基准显示了完成此任务的替代方法:

require 'fruity'  # => true

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']

Hash[names.map {|v| [v, 0]}]                    # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )]              # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h               # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash  # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}

compare do 
  map_hash { Hash[names.map {|v| [v, 0]}] }
  zip_hash { Hash[names.zip([0] * names.size )] }
  to_h_hash { names.zip([0] * names.size ).to_h }
  hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end

# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%

Looking at the conditional in the loop to see how it effects the code: 查看循环中的条件以查看其如何影响代码:

require 'fruity'  # => true

NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]

def build_hash
  h = {}
  NAMES.each{ |k| h[k] = 0 }
  h
end

compare do 

  your_way {
    hash = build_hash()
    STATS.each do |item| # basic loop to count the records
      hash[item] += 1 if hash.has_key?(item)
    end
    hash
  }

  my_way {
    hash = build_hash()
    STATS2.each { |e| hash[e] += 1 }
    hash
  }
end

# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%

While several answers suggested using count , the code is going to slow down a lot as your lists increase in size, where walking the stats array once, as you are doing, will always be linear, so stick to one of these iterative solutions. 尽管建议使用count回答几个问题,但随着列表大小的增加,代码将减慢很多速度,在这种情况下,一次遍历stats数组将始终是线性的,因此请使用以下迭代解决方案之一。

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

stats.each_with_object(Hash.new(0)) { |user,hash| 
  hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']

hash = Hash.new

names.each { |name| hash[name] = stats.count(name) }

puts hash

You could use with map and Hash[]. 您可以使用map和Hash []。

names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM