Most efficient way of comparing arrays with hashes inside them based on a unique hash key in Ruby

Question

I got two hashes hash_a and hash_b which are actually arrays but have hash inside them. Those hash have unique key.

hash_a = [
{:unique_key => 1, :data => 'data for A1'},
{:unique_key => 2, :data => 'data for A2'},
{:unique_key => 3, :data => 'data for A3'}
]

hash_b = [
{:unique_key => 1, :data => 'data for B1'},
{:unique_key => 2, :data => 'data for B2'},
{:unique_key => 4, :data => 'data for B4'},
{:unique_key => 5, :data => 'data for B5'}
]

Now I want to find out difference between hash_a and hash_b, such that I get the hash_c as array of new hashes present in hash_b. I basically want hash_b - hash_a

So I want this output for hash_c, hash_c should be this:

[
{:unique_key => 1, :data => 'data for A1'},
{:unique_key => 2, :data => 'data for A2'},
{:unique_key => 3, :data => 'data for A3'},
{:unique_key => 4, :data => 'data for B4'},
{:unique_key => 5, :data => 'data for B5'}
]

I have tried something like this:

hash_c = hash_a
hash_b.each do |inner_bhash|
    found = 0

    hash_a.each do |inner_ahash|
        if(inner_ahash[:unique_key] == inner_bhash[:unique_key])
            found = 1
            break
        end
    end

    if(found==0)
        hash_c.push(inner_bhash)
    end
end

This is doing the trick, but I want a better way. Like hashmap or something, I don't know what.

In addition, I may want to see only the new entries, ie

[
  {:unique_key => 4, :data => 'data for B4'},
  {:unique_key => 5, :data => 'data for B5'}
]

I can do that in my code by replacing

hash_c = hash_a

with

hash_c = []

but how could I adapt this requirement in the same way?

Answer 1

With Hash es you can use merge to do what you want - so going through making each Array into a Hash you can do the following:

hash_b.group_by { |e| e[:unique_key] }.
   merge(hash_a.group_by { |e| e[:unique_key] }).values.flatten
# => [{:unique_key=>1, :data=>"data for A1"}, 
#     {:unique_key=>2, :data=>"data for A2"}, 
#     {:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}, 
#     {:unique_key=>3, :data=>"data for A3"}]

If you want to have just the entries of hash_b (which do not have a key in hash_a ), given you already have the solution above - you can simply subtract hash_a from the result:

hash_b.group_by { |e| e[:unique_key] }.
  merge(hash_a.group_by { |e| e[:unique_key] }).values.flatten - hash_a
# => [{:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}]

Another, more straight forward way is to filter out all elements of hash_b who have an entry in hash_a :

hash_b.select { |x| hash_a.none? { |y| x[:unique_key] == y[:unique_key] } }
# => [{:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}]

Answer 2

You can use the form of Array#uniq that takes a block.

 (hash_a + hash_b).uniq { |h| h[:unique_key] }
  #=> [{:unique_key=>1, :data=>"data for A1"}, {:unique_key=>2, :data=>"data for A2"},
  #    {:unique_key=>3, :data=>"data for A3"}, {:unique_key=>4, :data=>"data for B4"},
  #    {:unique_key=>5, :data=>"data for B5"}]

To quote the doc, "self is traversed in order, and the first occurrence is kept."

Most efficient way of comparing arrays with hashes inside them based on a unique hash key in Ruby

Question

2 answers

solution1
3 ACCPTED 2016-07-21 12:38:13

solution2
1 2016-07-21 18:36:33

Most efficient way of comparing arrays with hashes inside them based on a unique hash key in Ruby

Question

2 answers

solution1 3 ACCPTED 2016-07-21 12:38:13

solution2 1 2016-07-21 18:36:33

solution1
3 ACCPTED 2016-07-21 12:38:13

solution2
1 2016-07-21 18:36:33