简体   繁体   中英

Selective merge of array of hashes in Ruby

I have two arrays containing hashes:

a = [
  {:umc=>"11VE", :title=>"FOOBARS"},
  {:umc=>"1973", :title=>"ZOOBARS"},
  {:umc=>"1140", :title=>"TINYBAR"},
]

b = [
  {:umc=>"11VE", :code=>"23"},
  {:umc=>"10EE", :code=>"99"},
  {:umc=>"1140", :code=>"44"},
  {:umc=>"1973", :code=>"55"},
]

and would like to selectively merge them into another array with hashes as follows:

c = [
  {:umc=>"11VE", :title=>"FOOBARS", :code=>"23"},
  {:umc=>"1973", :title=>"ZOOBARS", :code=>"55"},
  {:umc=>"1140", :title=>"TINYBAR"} :code=>"44"},
]

I am using the code

combo=(a+b).group_by{|h| h[:umc]}.map{|k,v| v.reduce(:merge)}

which merges the two arrays just fine, but I would like the result to only include items that appear in the first array.

As a second idea, it would be great to be able to have two results, one that consists of items combined from both initial arrays and a second containing elements that are in the first but not in the second.

b.reduce(a) do |memo, e| 
  (ae = memo.detect { |ae| ae[:umc] == e[:umc] }) && ae.merge!(e)
  memo
end

Cary proposed “more readable” way to express merging, plus whether we don't want to mutate initial a array, we should dup it's elements:

b.reduce(a.map(&:dup)) do |memo, e| 
  ae = memo.detect { |ae| ae[:umc] == e[:umc] }
  ae.merge!(e) if ae
  memo
end

Not be very elegant, but it should be quite efficient computation wise, as it is O(n+m) on average. I doubt you can get it faster than this. On the downside it burns a lot of space.

def selective_merge(a, b)
  merged = {}
  a.each {|v| merged[v[:umc]] = [v[:umc], v[:title]]}
  b.each {|v| merged[v[:umc]] = merged[v[:umc]] << v[:code] if merged[v[:umc]]}
  merged.map {|k,v| {:umc => v[0], :title => v[1], :code => v[2]}}
end

You could modify your code as follows:

bb = b.select { |f| a.any? { |h| h[:umc] == f[:umc] } }
  #=> [{:umc=>"11VE", :code=>"23"},
  #    {:umc=>"1140", :code=>"44"},
  #    {:umc=>"1973", :code=>"55"}] 
(a + bb).group_by { |g| g[:umc] }.map { |_,v| v.reduce(:merge) }
  #=> [{:umc=>"11VE", :title=>"FOOBARS", :code=>"23"},
  #    {:umc=>"1973", :title=>"ZOOBARS", :code=>"55"},
  #    {:umc=>"1140", :title=>"TINYBAR", :code=>"44"}]

but it would be more efficient to calculate bb thusly:

require 'set'
umc_a_vals = a.map { |g| g[:umc] }.to_set
  #=> #<Set: {"11VE", "1973", "1140"}>
bb = b.select { |f| umc_a_vals.include(f[:umc]) }
  #=> [{:umc=>"11VE", :code=>"23"},
  #    {:umc=>"1140", :code=>"44"},
  #    {:umc=>"1973", :code=>"55"}] 

Here's another way:

f = a.group_by { |g| g }.map { |_,v| v.reduce(:merge) }
  #=> [{:umc=>"11VE", :title=>"FOOBARS"},
  #    {:umc=>"1973", :title=>"ZOOBARS"},
  #    {:umc=>"1140", :title=>"TINYBAR"}] 
b.each_with_object(f) do |g,h| 
  (h.update(g[:umc]=>g) { |_,o,n| o.merge(n) }) if h.key?(g[:umc])
end.values
  #=> [{:umc=>"11VE", :title=>"FOOBARS", :code=>"23"},
  #    {:umc=>"1973", :title=>"ZOOBARS", :code=>"55"},
  #    {:umc=>"1140", :title=>"TINYBAR", :code=>"44"}]

The calculation of f is so similar to your code that I don't think an explanation is needed.

We now merge the hash:

{ g[:umc]=g }
  #=> { "11VE"=>{:umc=>"11VE", :code=>"23"} }

into h if h has a key "11VE" , which it does. For that we use the form of Hash#update (aka merge! ) that employs the block:

{ |_,o,n| o.merge(n) }

to determine the values of keys that are present in both hashes being merged.

The block variables equal:

_ #=> "11VE"
o #=> {:umc=>"11VE", :title=>"FOOBARS"}
n #=> {:umc=>"11VE", :code=>"23"}

so the result of the block calculation is:

o.merge(n)
  #=> {:umc=>"11VE", :title=>"FOOBAR, :code=>"23"}

which is the updated value of h[:umc] .

Aside: I used the local variable _ for the value of the key to draw attention to the fact that it is not being used in the block calculation. The variables o and n are commonly used to represent the "old" and "new" values, respectively.

The merging into h of the hashes constructed from the remaining values of b is done similarly.

The final step is to extract the values of h .

As a second example, suppose:

a = [
  {:umc=>"11VE", :title=>"FOOBARS"},
  {:umc=>"11VE", :title=>"ZOOBARS", :author=>"Billy-Bob"},
  {:umc=>"1140", :title=>"TINYBAR"}
]

We obtain:

f = a.each_with_object({}) { |g,h| h.update(g[:umc]=>g) { |_,o,n| o.merge(n) } }  
  #=> {"11VE"=>{:umc=>"11VE", :title=>"ZOOBARS", :author=>"Billy-Bob"},
  #    "1140"=>{:umc=>"1140", :title=>"TINYBAR"}} 
b.each_with_object(f) do |g,h| 
  (h.update(g[:umc]=>g) { |_,o,n| o.merge(n) }) if h.key?(g[:umc])
end.values
  #=> [{:umc=>"11VE", :title=>"ZOOBARS", :author=>"Billy-Bob", :code=>"23"},
  #    {:umc=>"1140", :title=>"TINYBAR", :code=>"44"}] 

Try this

# @a - Pivot array(elements that will be in result array)
# @b - Array which contains intersecting elements
# @group - attribute name, which identify similarity of elements
def c_join(a, b, group)
  result = []
  a.each do |a_i|
    similar = b.detect{ |b_i| b_i[group] == a_i[group] }
    result << a_i.merge(similar)
  end
  result
end

Testing

a = [
  {:umc=>"11VE", :title=>"FOOBARS"},
  {:umc=>"1973", :title=>"ZOOBARS"},
  {:umc=>"1140", :title=>"TINYBAR"},
]

b = [
  {:umc=>"11VE", :code=>"23"},
  {:umc=>"10EE", :code=>"99"},
  {:umc=>"1140", :code=>"44"},
  {:umc=>"1973", :code=>"55"},
]

c_join(a, b, :umc)

result

[
  {:umc=>"11VE", :title=>"FOOBARS", :code=>"23"}, 
  {:umc=>"1973", :title=>"ZOOBARS", :code=>"55"}, 
  {:umc=>"1140", :title=>"TINYBAR", :code=>"44"}
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM