I have two files: (1) the first file contains the users of the system, I read that file into an array (2) the second file contains statistics about those users
My task is to have a user count, something like
{"user1" => 1, "user2" => 0, "user3" => 4}
This how I solved the problem
# Result wanted
# Given the names and stats array generate the results array
# result = {'user1' => 3, 'user2' => 1, 'user3' => 0, 'user4' => 1}
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map {|v| [v, 0]}] # to make sure every name gets a value
stats.each do |item| # basic loop to count the records
hash[item] += 1 if hash.has_key?(item)
end
puts hash
# terminal outcome
# $ ruby example.rb
# {"user1"=>3, "user2"=>2, "user3"=>0, "user4"=>1}
I'm only curious if there is a better way than counting in a loop, specially since Ruby comes with magical powers and I come from a C background
Basically your code is the fastest you can get code to run for this, except for some minor issues.
If you have an unneeded entry marking the end of the array
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
I think you should pop
it off prior to running, since it has the possibility of resulting in a weird entry, and its existence forces you to use the conditional test in your loop, slowing your code.
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]
Built-in methods exist, which reduce the amount of code to a single call, but they're slower than a loop:
stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}
From there it's easy to map
the resulting hash into summaries:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]
And then into a hash again:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}
or:
Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}
Using the built-in methods are efficient, and very useful when you're dealing with very large lists, because there's very little redundant looping going on.
Looping over the data like you did is also very fast, and usually faster than the built-in methods, when written correctly. Here are some benchmarks showing alternate ways of accomplishing this stuff:
require 'fruity' # => true
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']
Hash[names.map {|v| [v, 0]}] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
compare do
map_hash { Hash[names.map {|v| [v, 0]}] }
zip_hash { Hash[names.zip([0] * names.size )] }
to_h_hash { names.zip([0] * names.size ).to_h }
hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end
# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%
Looking at the conditional in the loop to see how it effects the code:
require 'fruity' # => true
NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]
def build_hash
h = {}
NAMES.each{ |k| h[k] = 0 }
h
end
compare do
your_way {
hash = build_hash()
STATS.each do |item| # basic loop to count the records
hash[item] += 1 if hash.has_key?(item)
end
hash
}
my_way {
hash = build_hash()
STATS2.each { |e| hash[e] += 1 }
hash
}
end
# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%
While several answers suggested using count
, the code is going to slow down a lot as your lists increase in size, where walking the stats
array once, as you are doing, will always be linear, so stick to one of these iterative solutions.
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.each_with_object(Hash.new(0)) { |user,hash|
hash[user] += 1 if names.include?(user) }
#=> {"user1"=>3, "user2"=>2, "user4"=>1}
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash.new
names.each { |name| hash[name] = stats.count(name) }
puts hash
You could use with map and Hash[].
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
hash = Hash[names.map { |name| [name, stats.count(name)] }]
puts hash
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.