简体   繁体   中英

Merging hash values in an array of hashes based on key

I have an array of hashes similar to this:

[
  {"student": "a","scores": [{"subject": "math","quantity": 10},{"subject": "english", "quantity": 5}]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]},
  {"student": "a", "scores": [ { "subject": "math", "quantity": 2},{"subject": "science", "quantity": 5 } ] }
]

Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?

[
  {"student": "a","scores": [{"subject": "math","quantity": 12},{"subject": "english", "quantity": 5},{"subject": "science", "quantity": 5 } ]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]}
]

Rules for merging duplicate objects:

  • Students are merged on matching "value" (eg student "a", student "b")
  • Students scores on identical subjects are added (eg student a's math scores 2 and 10 become 12 when merged)

Is there a simpler way of getting the output similar to this except looping through the array and finding a duplicate and then combining them?

Not that I know of. IF you explain where this data is comeing form the answer may be different but just based on the Array of Hash objects I think you will haev to iterate and combine.

While it is not elegant you could use a solution like this

arr = [
      {"student"=> "a","scores"=> [{"subject"=> "math","quantity"=> 10},{"subject"=> "english", "quantity"=> 5}]},
      {"student"=> "b", "scores"=> [{"subject"=> "math","quantity"=> 1 }, {"subject"=> "english","quantity"=> 2 } ]},
      {"student"=> "a", "scores"=> [ { "subject"=> "math", "quantity"=> 2},{"subject"=> "science", "quantity"=> 5 } ] }
    ]
#Group the array by student
arr.group_by{|student| student["student"]}.map do |student_name,student_values|
  {"student" => student_name,
  #combine all the scores and group by subject
  "scores" => student_values.map{|student| student["scores"]}.flatten.group_by{|score| score["subject"]}.map do |subject,subject_values|
    {"subject" => subject,
    #combine all the quantities into an array and reduce using `+`
    "quantity" => subject_values.map{|h| h["quantity"]}.reduce(:+)
    }
  end
  }
end
#=> [
    {"student"=>"a", "scores"=>[
                        {"subject"=>"math", "quantity"=>12},  
                        {"subject"=>"english", "quantity"=>5}, 
                        {"subject"=>"science", "quantity"=>5}]}, 
    {"student"=>"b", "scores"=>[
                        {"subject"=>"math", "quantity"=>1}, 
                        {"subject"=>"english", "quantity"=>2}]}
    ]

I know that you specified your expected result but I wanted to point out that making the output simpler makes the code simpler.

 arr.map(&:dup).group_by{|a| a.delete("student")}.each_with_object({}) do |(student, scores),record|
   record[student] = scores.map(&:values).flatten.map(&:values).each_with_object(Hash.new(0)) do |(subject,score),obj|
     obj[subject] += score
     obj
  end
  record
 end
 #=>{"a"=>{"math"=>12, "english"=>5, "science"=>5}, "b"=>{"math"=>1, "english"=>2}}

With this structure getting the students is as easy as calling .keys and the scores would be equally as simple. I am thinking something like

above_result.each do |student,scores|
    puts student
    scores.each do |subject,score|
      puts "  #{subject.capitalize}: #{score}"
    end
  end
end

The console out put would be

a
  Math: 12
  English: 5
  Science: 5
b
  Math: 1
  English: 2

There are two common ways of aggregating values in such instances. The first is to employ the method Enumerable#group_by , as @engineersmnky has done in his answer. The second is to build a hash using the form of the method Hash#update (aka merge! ) that uses a block to resolve the values of keys which are present in both of the hashes being merged. My solution uses the latter approach, not because I prefer it to the group_by , but just to show you a different way it can be done. (Had engineersmnky used update , I would have gone with group_by .)

Your problem is complicated somewhat by the particular data structure you are using. I found that the solution could be simplfied and made easier to follow by first converting the data to a different structure, update the scores, then convert the result back to your data structure. You may want to consider changing the data structure (if that's an option for you). I've addressed that issue in the "Discussion" section.

Code

def combine_scores(arr)
  reconstruct(update_scores(simplify(arr)))
end

def simplify(arr)
  arr.map do |h|
    hash = Hash[h[:scores].map { |g| g.values }]
    hash.default = 0
    { h[:student]=> hash }
  end
end

def update_scores(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g) do |_, h_scores, g_scores|
      g_scores.each { |subject,score| h_scores[subject] += score }
      h_scores
    end
  end
end

def reconstruct(h)
  h.map { |k,v| { student: k, scores: v.map { |subject, score|
    { subject: subject, score: score } } } }
end

Example

arr = [
  { student: "a", scores: [{ subject: "math",    quantity: 10 },
                           { subject: "english", quantity:  5 }] },
  { student: "b", scores: [{ subject: "math",    quantity:  1 },
                           { subject: "english", quantity:  2 } ] },
  { student: "a", scores: [{ subject: "math",    quantity:  2 },
                           { subject: "science", quantity:  5 } ] }]
combine_scores(arr)
  #=> [{ :student=>"a",
  #      :scores=>[{ :subject=>"math",    :score=>12 },
  #                { :subject=>"english", :score=> 5 },
  #                { :subject=>"science", :score=> 5 }] },
  #    { :student=>"b",
  #      :scores=>[{ :subject=>"math",    :score=> 1 },
  #                { :subject=>"english", :score=> 2 }] }] 

Explanation

First consider the two intermediate calculations:

a = simplify(arr)
  #=> [{ "a"=>{ "math"=>10, "english"=>5 } },
  #    { "b"=>{ "math"=> 1, "english"=>2 } },
  #    { "a"=>{ "math"=> 2, "science"=>5 } }]

h = update_scores(a)
  #=> {"a"=>{"math"=>12, "english"=>5, "science"=>5}
  #    "b"=>{"math"=> 1, "english"=>2}}

Then

reconstruct(h)

returns the result shown above.

+ simplify

arr.map do |h|
  hash = Hash[h[:scores].map { |g| g.values }]
  hash.default = 0
  { h[:student]=> hash }
end

This maps each hash into a simpler one. For example, the first element of arr :

h = { student: "a", scores: [{ subject: "math",    quantity: 10 },
                             { subject: "english", quantity:  5 }] }

is mapped to:

{ "a"=>Hash[[{ subject: "math",    quantity: 10 },
             { subject: "english", quantity:  5 }].map { |g| g.values }] }
#=> { "a"=>Hash[[["math", 10], ["english", 5]]] }
#=> { "a"=>{"math"=>10, "english"=>5}}

Setting the default value of each hash to zero simplifies the update step, which follows.

+ update_scores

For the array of hashes a that is returned by simplify , we compute:

a.each_with_object({}) do |g,h|
  h.update(g) do |_, h_scores, g_scores|
    g_scores.each { |subject,score| h_scores[subject] += score }
    h_scores
  end
end

Each element of a (a hash) is merged into an initially-empty hash, h . As update (same as merge! ) is used for the merge, h is modified. If both hashes share the same key (eg, "math"), the values are summed; else subject=>score is added to h .

Notice that if h_scores does not have the key subject , then:

h_scores[subject] += score
  #=> h_scores[subject] = h_scores[subject] + score
  #=> h_scores[subject] = 0 + score (because the default value is zero)
  #=> h_scores[subject] = score

That is, the key-value pair from g_scores is merely added to h_scores .

I've replaced the block variable representing the subject with a placeholder _ , to reduce the chance of errors and to inform the reader that it is not used in the block.

+ reconstruct

The final step is to convert the hash returned by update_scores back to the original data structure, which is straightforward.

Discussion

If you change the data structure, and it meets your requirements, you may wish to consider changing it to that produced by combine_scores :

h = { "a"=>{ math: 10, english: 5 }, "b"=>{ math:  1, english: 2 } }

Then to update the scores with:

g = { "a"=>{ math: 2, science: 5 }, "b"=>{ english: 3 }, "c"=>{ science: 4 } }

you would merely to the following:

h.merge(g) { |_,oh,nh| oh.merge(nh) { |_,ohv,nhv| ohv+nhv } }
  #=> { "a"=>{ :math=>12, :english=>5, :science=>5 },
  #     "b"=>{ :math=> 1, :english=>5 },
  #     "c"=>{ :science=>4 } }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM