简体   繁体   中英

How do I efficiently extract all values with a certain key name from a hash of hashes?

I have this data:

members = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

And want to generate an array like:

["foo@example.org", "bar@example.org", "exx@example.org"]

Currently I'm using:

members['data'].collect { |h| h['email'] }
  1. Is there a more efficient way to achieve it in regards to performance?
  2. Is there an even shorter way to achieve it?

I have Rails available.

Additionally to the other answers, I'll add the if you're able to construct the Hash using symbols as keys you can have a performance gain when collecting the values, for instance:

require 'benchmark'

members_without_sym = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

members_with_sym = {:total=>3, :data=>[
  {:email=> "foo@example.org", :timestamp => "2013-03-16 01:11:01"},
  {:email=> "bar@example.org", :timestamp => "2013-03-16 02:07:30"},
  {:email=> "exx@example.org", :timestamp=> "2013-03-16 03:06:24"}
]}

Benchmark.bm(1) do |algo|
  algo.report("Without symbol"){
    2_000_000.times do 
       members_without_sym['data'].collect { |h| h['email'] }
    end   
  }
  algo.report("With symbol"){
    2_000_000.times do 
      members_with_sym[:data].collect { |h| h[:email] }      
    end
  }
end

Results:

        user     system      total        real
Without symbol  2.260000   0.000000   2.260000 (  2.254277)
With symbol  0.880000   0.000000   0.880000 (  0.878603)

Other than optimising the h['email'] part into native extensions, I cannot see how you could make the above example more efficient. The efficiency gain of doing so would be tiny for the example size of data set, and much less than optimising I/O of fetching/parsing this data in the first place I'd suspect.

Depending on your data source, having the hash keys as labels, and not strings, is a common Ruby idiom, and also more efficient in terms of memory use. This is potentially a larger gain in efficiency, and might be worth it provided you don't have to put a large amount of effort in to convert the data (eg you can somehow change the nature of the given data structure from your data source, without needing to convert the hash just to query it once!)

members = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

temp = members["data"].map{|x|x["email"]}

gives you ["foo@example.org", "bar@example.org", "exx@example.org"]

Difference between map and collect in Ruby?

--

Maybe Structs would improve performance

Record = Struct.new(:email, :timestamp)
members = {"total"=>3, "data"=>[
  Record.new("foo@example.org","2013-03-16 01:11:01"),
  Record.new("bar@example.org","2013-03-16 02:07:30"),
  Record.new("exx@example.org","2013-03-16 03:06:24")
]}

temp = members["data"].map(&:email)

http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM