简体   繁体   中英

Ruby group array of arrays to merge them into one single array based on unique key in arrays

I tried other options available in other stack overflow requests but couldn't get the result I was looking for.

I have an array of arrays in it as below:

Input:

[["ABC", "5A2", nil, "88474"],
 ["ABC", nil, "2", "88474"],
 ["ABC", nil, nil, "88474"],
 ["ABC", nil, nil, "88474"],
 ["Jack", "5A2", nil, "05195"],
 ["Jack", nil, "2", "05195"],
 ["Jack", nil, nil, "05195"],
 ["Jack", nil, nil, "05195"]]

Array index 0 ABC or Jack will be used as group_by condition and I want the output as below where all ABC arrays are merged to show the values and removes the nil if any of the array holds a value in that index position:

Output:

[["ABC", "5A2", "2", "88474"],
["Jack", "5A2", "2", "05195"]]

It won't be always the same format as input where first element second value follows by second element third value. It can be changing but second value wont be set twice in multiple elements for same index 0 with different values and same applies for third or if I add fourth or fifth elements as well.

I have worked with array of hash but not array of arrays so not sure how to do it.

I'm not certain that I understand the question but I expect you may be looking for the following.

arr = [
  ["ABC", "5A2", nil, "88474"],
  ["ABC", nil, "2", "88474"],
  ["ABC", nil, nil, "88474"],
  ["ABC", nil, nil, "88474"],
  ["Jack", "5A2", nil, "05195"],
  ["Jack", nil, "2", "05195"],
  ["Jack", nil, nil, "05195"],
  ["Jack", nil, nil, "05195"]
]
arr.each_with_object({}) do |a, h|
  h.update(a.first=>a) { |_k, oa, na| oa.zip(na).map { |ov, nv| ov.nil? ? nv : ov } }
end.values 
  #=> [["ABC", "5A2", "2", "88474"], ["Jack", "5A2", "2", "05195"]] 

This uses the form of Hash#update (aka merge! ) that employs the block

{ |_k, oa, na| oa.zip(na).map { |ov, nv| ov.nil? ? nv : ov } }

to determine the values of keys that are present in both the hash being built ( h ) and the hash being merged ( { a.first=>a } ). See the doc for a description of the three block variables, _k , oa and na . 1

I can best explain how the calculations procede by salting the code with puts statements and running it with an abbreviated array arr .

arr = [
  ["ABC", "5A2", nil, "88474"],
  ["ABC", nil, "2", "88474"],
  ["Jack", "5A2", nil, "05195"],
  ["Jack", nil, "2", "05195"],
]
arr.each_with_object({}) do |a, h|
  puts "\na = #{a}"
  puts "h = #{h}"
  puts "a.first=>a = #{a.first}=>#{a}"
  h.update(a.first=>a) do |_k, oa, na|
    puts "_k = #{_k}"
    puts "oa = #{oa}"
    puts "na = #{na}"
    a = oa.zip(na)
    puts "oa.zip(na) = #{a}"
    a.map do |ov, nv|
      puts "  ov = #{ov}, nv = #{nv}"
      puts "  ov.nil? ? nv : ov = #{ov.nil? ? nv : ov}"
      ov.nil? ? nv : ov
    end
  end
end.tap { |h| puts "h = #{h}" }.values 
  #=> [["ABC", "5A2", "2", "88474"], ["Jack", "5A2", "2", "05195"]]

The following is displayed.

a = ["ABC", "5A2", nil, "88474"]
h = {}
a.first=>a = ABC=>["ABC", "5A2", nil, "88474"]
(The block is not called here because h does not have a key "ABC")
a = ["ABC", nil, "2", "88474"]
h = {"ABC"=>["ABC", "5A2", nil, "88474"]}
a.first=>a = ABC=>["ABC", nil, "2", "88474"]
_k = ABC
oa = ["ABC", "5A2", nil, "88474"]
na = ["ABC", nil, "2", "88474"]
oa.zip(na) = [["ABC", "ABC"], ["5A2", nil], [nil, "2"], ["88474", "88474"]]
  ov = ABC, nv = ABC
  ov.nil? ? nv : ov = ABC
  ov = 5A2, nv = 
  ov.nil? ? nv : ov = 5A2
  ov = , nv = 2
  ov.nil? ? nv : ov = 2
  ov = 88474, nv = 88474
  ov.nil? ? nv : ov = 88474
a = ["Jack", "5A2", nil, "05195"]
h = {"ABC"=>["ABC", "5A2", "2", "88474"]}
a.first=>a = Jack=>["Jack", "5A2", nil, "05195"]
(The block is not called here because h does not have a key "Jack")
a = ["Jack", nil, "2", "05195"]
h = {"ABC"=>["ABC", "5A2", "2", "88474"], "Jack"=>["Jack", "5A2", nil, "05195"]}
a.first=>a = Jack=>["Jack", nil, "2", "05195"]
_k = Jack
oa = ["Jack", "5A2", nil, "05195"]
na = ["Jack", nil, "2", "05195"]
oa.zip(na) = [["Jack", "Jack"], ["5A2", nil], [nil, "2"], ["05195", "05195"]]
  ov = Jack, nv = Jack
  ov.nil? ? nv : ov = Jack
  ov = 5A2, nv = 
  ov.nil? ? nv : ov = 5A2
  ov = , nv = 2
  ov.nil? ? nv : ov = 2
  ov = 05195, nv = 05195
  ov.nil? ? nv : ov = 05195
h = {"ABC"=>["ABC", "5A2", "2", "88474"], "Jack"=>["Jack", "5A2", "2", "05195"]}

1. As is common practice, I began the name of the common key, _k , with an underscore to signal to the reader that it is not used in the block calculation. Often you will see such a block variable represented by an underscore alone.

Another options is to group the array based on the first element using Enumerable#group_by and Hash#values , then zipping into columns and compact each column taking the first element (better explanation below).


This is the final one liner

arr.group_by(&:first).values.map { |first, *last| first.zip(*last).map { |a| a.compact.first } }

#=> [["ABC", "5A2", "2", "88474"], ["Jack", "5A2", "2", "05195"]]

How it works

Let's say you have those three arrays,that you can consider as rows of a table:

a = [1, nil, nil]
b = [1, 2, nil]
c = [nil, 2, 3]

You can Array#zip to get the columns:

a.zip(b, c) #=> [[1, 1, nil], [nil, 2, 2], [nil, nil, 3]]

Another way to get the columns is to wrap the array in another array:

tmp = [a, b, c]

This shows how to unpack the array:

tmp.then { |first, *last| p first; p *last }
#=> [1, nil, nil]
#=> [1, 2, nil]
#=> [nil, 2, 3]

So, zipping returns:

tmp.then { |first, *last| first.zip *last }
#=> [[1, 1, nil], [nil, 2, 2], [nil, nil, 3]]

Now, once we have the columns we need to compact ( Array#compact ) each one and take the first element.

For example:

[1, 1, nil].compact.first #=> 1

In the case of tmp array:

tmp.then { |first, *last| first.zip(*last).map { |column| column.compact.first } }
#=> [1, 2, 3]

Putting everything together you get the final one liner shown above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM