I have an array of hashes that represent compounds stored in boxes.
database = [{"Name"=>"Compound1", "Box"=>1},
{"Name"=>"Compound2", "Box"=>1},
{"Name"=>"Compound2", "Box"=>1},
{"Name"=>"Compound3", "Box"=>1},
{"Name"=>"Compound4", "Box"=>1},
{"Name"=>"Compound5", "Box"=>2},
{"Name"=>"Compound6", "Box"=>2},
{"Name"=>"Compound1", "Box"=>3},
{"Name"=>"Compound2", "Box"=>3},
{"Name"=>"Compound3", "Box"=>3},
{"Name"=>"Compound7", "Box"=>4}]
I would like to select a subset of the array, minimum by the number of boxes, that covers the full inventory of compounds (ie, 1 to 7). Thus the result would be:
database = [{"Name"=>"Compound1", "Box"=>1},
{"Name"=>"Compound2", "Box"=>1},
{"Name"=>"Compound3", "Box"=>1},
{"Name"=>"Compound4", "Box"=>1},
{"Name"=>"Compound5", "Box"=>2},
{"Name"=>"Compound6", "Box"=>2},
{"Name"=>"Compound7", "Box"=>4}]
I can use the following to group compounds per box:
database.group_by{|x| x['Box']}
I have trouble reducing the result so that duplicate compound names are removed from the grouped operation.
With Ruby >= 2.4 we can use transform_values
:
database.group_by { |hash| hash["Name"] }
.transform_values { |v| v.min_by { |h| h["Box"] } }
.values
Or if you have Ruby < 2.4 you can do:
database.group_by {|hash| hash["Name"] }.map { |_,v| v.min_by {|h| h["Box"]} }
Key methods: group_by
, transform_values
(Ruby > 2.4) and min_by
. See Ruby Docs for more info.
You could try with Array#uniq
:
database = [{name: "Compound1", box: 1}, {name: "Compound2", box: 1}, {name: "Compound2", box: 1}, {name: "Compound3", box: 1}, {name: "Compound4", box: 1}, {name: "Compound5", box: 2}, {name: "Compound6", box: 2}, {name: "Compound1", box: 3}, {name: "Compound2", box: 3}, {name: "Compound3", box: 3}, {name: "Compound7", box: 4}]
p database.uniq{|k,_v| k[:name]}
# => [
# {:name=>"Compound1", :box=>1},
# {:name=>"Compound2", :box=>1},
# {:name=>"Compound3", :box=>1},
# {:name=>"Compound4", :box=>1},
# {:name=>"Compound5", :box=>2},
# {:name=>"Compound6", :box=>2},
# {:name=>"Compound7", :box=>4}
# ]
Or:
p database.group_by{|k,_v| k[:box]}.each{|_k,v| v.uniq!{|k,_v| k[:name]}}
# => {
# 1=>[
# {:name=>"Compound1", :box=>1},
# {:name=>"Compound2", :box=>1},
# {:name=>"Compound3", :box=>1},
# {:name=>"Compound4", :box=>1}
# ],
# 2=>[
# {:name=>"Compound5", :box=>2},
# {:name=>"Compound6", :box=>2}
# ],
# 3=>[
# {:name=>"Compound1", :box=>3},
# {:name=>"Compound2", :box=>3},
# {:name=>"Compound3", :box=>3}
# ],
# 4=>[
# {:name=>"Compound7", :box=>4}
# ]
# }
The essence of the problem is to find a minimal-size combination of boxes that includes ("covers") all of a set of specified "components". That combination of boxes is then used to compute objects of interest, as shown below.
Code
def min_box(database, coverage)
boxes_to_compounds = database.each_with_object(Hash.new {|h,k| h[k]=[]}) { |g,h|
h[g["Box"]] << g["Name"] }
boxes = boxes_to_compounds.keys
(1...boxes.size).each do |n|
boxes.combination(n).each do |combo| return combo if
(coverage - combo.flat_map { |box| boxes_to_compounds[box] }).empty?
end
end
nil
end
coverage
is a given array of required compounds (eg, "Compound3").
Example
Suppose we are given database
as given in the question and
coverage = ["Compound1", "Compound2", "Compound3", "Compound4",
"Compound5", "Compound6", "Compound7"]
An optimal combination of boxes is then found to be
combo = min_box(database, coverage)
#=> [1, 2, 4]
We may now compute the desired array of elements of database
:
database.select { |h| combo.include?(h["Box"]) }.uniq
#=> [{"Name"=>"Compound1", "Box"=>1}, {"Name"=>"Compound2", "Box"=>1},
# {"Name"=>"Compound3", "Box"=>1}, {"Name"=>"Compound4", "Box"=>1},
# {"Name"=>"Compound5", "Box"=>2}, {"Name"=>"Compound6", "Box"=>2},
# {"Name"=>"Compound7", "Box"=>4}]
min_box
explanation
Finding an optimal combination of boxes is a hard (NP-complete) problem. Some form of enumeration of combinations of boxes is therefore required. I begin by determining if a single box provides the required coverage of components. If one of the boxes does, an optimal solution has been found and the method returns an array containing that box. If no single box covers all compounds, I look at all combinations of two boxes. If one of those combinations provides the required coverage it is an optimal solution and an array of those boxes is returned; else combinations of three boxes are considered. Eventually an optimal combination is found or it is concluded that all boxes together do not provide the required coverage, in which case the method returns nil
.
For the example above, the calculations are as follows.
boxes_to_compounds = database.each_with_object(Hash.new {|h,k| h[k]=[]}) { |g,h|
h[g["Box"]] << g["Name"] }
#=> {1=>["Compound1", "Compound2", "Compound2", "Compound3", "Compound4"],
# 2=>["Compound5", "Compound6"],
# 3=>["Compound1", "Compound2", "Compound3"],
# 4=>["Compound7"]}
boxes = boxes_to_compounds.keys
#=> [1, 2, 3, 4]
boxes.size
#=> 4
Each of the elements 1...boxes.size
is passed to the outer each
block. Consider box 3
.
n = 3
e = boxes.combination(n)
#=> #<Enumerator: [1, 2, 3, 4]:combination(3)>
We may see the objects that will be generated by this enumerator and passed to the inner each
block by converting it to an array.
e.to_a
#=> [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]
The first element generated by e
is passed to the block and the following is computed.
combo = e.next
#=> [1, 2, 3]
a = combo.flat_map { |box| boxes_to_compounds[box] }
#=> ["Compound1", "Compound2", "Compound2", "Compound3", "Compound4",
# "Compound5", "Compound6", "Compound1", "Compound2", "Compound3"]
b = coverage - a
#=> ["Compound7"]
b.empty?
#=> false
As that combination of boxes does not include "Compound7" we press on and pass the next element generated by e
to the block.
combo = e.next
#=> [1, 2, 4]
a = combo.flat_map { |box| boxes_to_compounds[box] }
#=> ["Compound1", "Compound2", "Compound2", "Compound3", "Compound4",
# "Compound5", "Compound6", "Compound7"]
b = coverage - a
#=> []
b.empty?
#=> true
We therefore have found an optimal combination of boxes, [1, 2, 4]
, which is returned by the method.
I don't like that original data structure. Why not just start with a hash of {CompoundX => BoxY}
since "Name"
and "Box"
are not really useful. But if you're married to that structure, here's how I would do it:
database = [{"Name"=>"Compound1", "Box"=>1},
{"Name"=>"Compound2", "Box"=>1},
{"Name"=>"Compound2", "Box"=>1},
{"Name"=>"Compound3", "Box"=>1},
{"Name"=>"Compound4", "Box"=>1},
{"Name"=>"Compound5", "Box"=>2},
{"Name"=>"Compound6", "Box"=>2},
{"Name"=>"Compound1", "Box"=>3},
{"Name"=>"Compound2", "Box"=>3},
{"Name"=>"Compound3", "Box"=>3},
{"Name"=>"Compound7", "Box"=>4}]
new_db_arr = database.collect{|h| h.flatten}.flatten.collect{|i| i if i != "Name" && i != "Box"}.compact!
new_db_hash = {}
new_db_arr.each_slice(2) do |a,b|
if new_db_hash[a].nil?
new_db_hash[a] = []
end
new_db_hash[a] << b
end
new_db_hash
boxes = new_db_hash.values
combos = boxes[0].product(*boxes[1..-1])
combos = combos.sort_by{|a| a.uniq.length }
winning_combo = combos[0].uniq
The bulk of the work is just transforming the data structure into the hash of :Compound => boxNumber
format. Then you generate every combination of boxes, sort by the combination's number of uniq items and take the one with the smallest number of uniq items as the answer. Not sure how great this would scale for very large datasets.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.