I have an array of hashes, eg:
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=nil}
I want to give them all ID values based on a given search criteria. For example, searching by Size should return:
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"=nil, "ID"="0"}
I have written the following code which checks each hash in the sequence with the following hashes.
for i in 0..data.length-2
data[i].store("ID", i)
for j in i+1..data.length-1
output = (data[i].keys & data[j].keys).select { |k| data[i][k] == data[j][k] }
if output.include? searchTerm
puts "Match!"
puts "---"
data[j].store("ID", data[i]["ID"])
else
puts "No match :("
puts "---"
end
end
puts "---Finished checking row---"
end
puts data
The issue is twofold:
A. Nil values count as a match, eg when searching by Colour:
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"=nil, "ID"="1"}
B. Matches seem to only work for the last pair found, eg when searching by Size:
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="2"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"="White", "ID"="2"}
In summary, I want to ignore nil values so they don't count as matches and for all instances of the same key:value pair to have the same value for the ID key.
Don't use for loops for array iteration. Ruby has plenty of nice methods for doing that.
Here is my solution:
hashes = [{"Breed" => "Beagle", "Size" => "Medium", "Colour" => "Brown"},
{"Breed" => "Pug", "Size" => "Small", "Colour" => nil},
{"Breed" => "Beagle", "Size" => "Medium", "Colour" => "Brown"},
{"Breed" => "Beagle", "Size" => "Large", "Colour" => nil}]
def search_id(elements, search_key)
# Get rid of elements with nil values.
target_elements = elements.reject {|e| e[search_key].nil?}
resul = []
id = 0
# Iterate through target_elements
target_elements.each do |currentElement|
# Check if exsits an element with the same value in the resul array
match = resul.find {|previousElement| currentElement[search_key] == previousElement[search_key]}
if match
# Use previous id
currentElement[:id] = match[:id]
else
# Assing a new id
currentElement[:id] = id
id += 1
end
# Add element to result
resul << currentElement
end
resul
end
puts search_id(hashes, 'Size')
I think that it might be worth considering grabbing all of the unique values from the searched entries first and then assigning each of them an id inside a hash (with the available terms as the keys), so that you can simply loop though the hashes and grab the correct id from the pre-generated hash. eg something like:
hashes = [
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"},
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil},
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"},
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil}
]
def append_id(items, search_key)
id_map = items.
map { |hash| hash[search_key] }.
uniq.
reject { |value| value.nil? }.
map.with_index { |value, index| [value, index] }.
to_h
items.each do |hash|
value = hash[search_key]
hash['ID'] = id_map[value] unless value.nil?
end
items
end
append_id(hashes, 'Size')
Here is one way to do that.
def doit(arr, key)
val_to_id = arr.map { |h| h[key] }
.uniq
.each_with_index.with_object({}) { |(key,i),h| h[key] = i.to_s }
arr.map { |h| h.merge("ID"=>val_to_id[h[key]]) }
end
arr = [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"},
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil},
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"},
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=nil}]
See Array#map , Array#uniq , Enumerable#each_with_index and Enumerator#with_object . The block variables ( |(key,i),h|
) are written to make use of Array decomposition .
doit(arr,"Breed")
#=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]
doit(arr,"Size")
#=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]
doit(arr, "Colour")
#=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"1"}]
The steps are as follows.
key = "Size"
a = arr.map { |h| h[key] }
#=> ["Medium", "Small", "Medium", "Medium"]
b = a.uniq
#=> ["Medium", "Small"]
val_to_id = b.each_with_index.with_object({}) { |(key,i),h| h[key] = i.to_s }
#=> {"Medium"=>"0", "Small"=>"1"}
arr.map { |h| h.merge("ID"=>val_to_id[h[key]]) }
#=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
# {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]
Let's consider the last calculation in more detail. The first value of arr
is passed to the block, the block variable h
is assigned its value and the block calculations are performed with the result returned to map
.
h = arr[0]
#=> {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"}
c = h[key]
#=> "Medium"
d = val_to_id[c]
#=> "Medium"
h.merge("ID"=>d)
#=> {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"}
The remaining elements of arr
are processed similarly.
As mentioned in the comments on this and other related posts coming from students of the same instructor, this is simply a bad exercise. Appending values to a Hash or Struct isn't hard. The X/Y problem you have is that you're trying to simulate things like joins, indexed keys, and intermediate tables that are built-in features of a database. Reinventing the wheel is generally a bad idea.
That said, if you can make some basic guarantees about your data such as the fact that :breed will always have a value, and that you can in fact merge values from different keys without clobbering important data or invalidating the whole notion of null values in database fields, then you can come pretty close to what you want. There are still a lot of edge cases because the data is idiopathic, not normalized, and the notion of giving similar-but-different records the same :id (whether or not you've merged them) seems problematic.
Despite all those caveats, here's a working solution that should address both your actual and underlying questions, as well as the related ones one of your colleagues posted. I'm not going to explain every line of code, but I documented it fairly thoroughly so you could experiment with it.
#!/usr/bin/env ruby
require 'logger'
Dog = Struct.new :breed, :size, :color, :id
class ReinventingTheDatabase
# Here's a semi-randomized "database" of sample data, which is exercised by the self-tests at the bottom of the code.
#
# @note No attempt is made to handle the use case of a missing breed, which is used elsewhere to sort and merge in a
# sensible way.
# @return [Array<Struct>] populated by Dog objects for use in data lookups and merging
DOG_DATABASE = [
['terrier', 'small', 'brown'],
['beagle', 'medium', 'brown'],
['pug', 'small', nil],
['beagle', nil, 'brown'],
['beagle', 'medium', 'brown'],
['beagle', 'medium', nil],
['poodle', 'medium', nil],
['schnauzer', 'small', 'grey'],
['poodle', nil, 'white'],
['schnauzer', 'small', nil],
].map { Dog.new *_1 }.freeze
# @attr_reader dogs [Array<Array>] carries a copy of the DOG_DATABASE constant converted to a Dog struct for data lookups
# @attr_reader last_query [Hash] holds the current database query for the instance
# @attr_reader last_query_result [Array<Struct>] the complete results of the data lookup
# @attr_reader last_merge [Array<Struct>] routinely updated as the @last_query_result is merged and manipulated
attr_reader *%i[dogs last_query last_query_result last_merge logger]
# @note You can get debugging output by adding +LOG_LEVEL='debug'+ to your environment, or tweak ENV in your REPL.
# @param database [Array<Struct>, #members] values to populate your pseudo-database with; must
# +:respond_to?(:members)+ or stuff will break
# @param logfile [String, StringIO] destination compatible with Ruby's standard library Logger
# @return [void] instantiate your faux database class
def initialize database=DOG_DATABASE, logfile: $stderr
@dogs = database.dup
@logger = Logger.new logfile, level: ENV.fetch('LOG_LEVEL', :info)
end
# Public interface for the class.
#
# @param kwargs [Hash, Array<Hash>] one or more keyword arguments that are members of members of +Dog < Struct+ class
# to use as search terms
# @see Dog
# @return [Array<Struct>] merged and tagged results of the data lookup and munging
def query **kwargs
@last_query = kwargs
@logger.debug "Querying database for: #{@last_query}"
query_result = db_query
@logger.debug "Final result after merging results: #{@last_merge}"
printf "Query: %{kwargs}\nResult: %{result}\n\n", {kwargs: kwargs, result: db_query.map(&:to_h)} unless
@logger.debug?
query_result
end
private
# @return [Array<Struct>] matching elements that have been merged and tagged with an :id attribute
def db_query
rows = []
@dogs.map { |row| @last_query.map { rows.push(row) if row[_1] == _2 } }
@last_merge = (@last_query_result = rows).dup
merge_struct_members
insert_ids_into_last_merge
end
# @return [Array<Struct>] +@last_merge+ including :id attributes
def insert_ids_into_last_merge
idx = -1
@last_merge.uniq!
@last_merge.map! { _1.id = idx+=1; _1 }
end
# Merge Struct members that are nil when possible.
#
# @param m [Symbol] Dog#member
# @return [Array<Struct>] modified +@last_query_result+
def merge_struct_members
Dog.members[0..-2].map do |m|
@last_merge.uniq.sort_by { _1.breed }.each_cons(2).map do |row1, row2|
cons_rows = [row1[m], row2[m]].compact
row1[m], row2[m] = cons_rows * 2 if cons_rows.one? && row1.breed == row2.breed
end
end
end
end
if __FILE__ == $0
db = ReinventingTheDatabase.new
db.query size: 'small'
db.query size: 'medium'
db.query size: 'small', color: 'brown'
db.query breed: 'poodle'
end
# vim: ft=ruby et sw=2 tw=120
When run from the shell with ruby dogs.rb
(or ./dogs.rb
if it's executable) you should get the following output:
Query: {:size=>"small"}
Result: [{:breed=>"terrier", :size=>"small", :color=>"brown", :id=>0}, {:breed=>"pug", :size=>"small", :color=>nil, :id=>1}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>2}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>3}]
Query: {:size=>"medium"}
Result: [{:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>0}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>1}, {:breed=>"poodle", :size=>"medium", :color=>nil, :id=>2}]
Query: {:size=>"small", :color=>"brown"}
Result: [{:breed=>"terrier", :size=>"small", :color=>"brown", :id=>0}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>1}, {:breed=>"pug", :size=>"small", :color=>nil, :id=>2}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>3}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>4}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>5}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>6}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>7}]
Query: {:breed=>"poodle"}
Result: [{:breed=>"poodle", :size=>"medium", :color=>"white", :id=>0}, {:breed=>"poodle", :size=>"medium", :color=>"white", :id=>1}]
If you run it with debug logging enabled, you'll get more visibility. For example, when invoked as LOG_LEVEL='debug' ruby dogs.rb
you should see:
D, [2022-05-29T23:24:09.055928 #43664] DEBUG -- : Querying database for: {:size=>"small"}
D, [2022-05-29T23:24:09.056087 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="terrier", size="small", color="brown", id=0>, #<struct Dog breed="pug", size="small", color=nil, id=1>, #<struct Dog breed="schnauzer", size="small", color="grey", id=2>]
D, [2022-05-29T23:24:09.056120 #43664] DEBUG -- : Querying database for: {:size=>"medium"}
D, [2022-05-29T23:24:09.056178 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="beagle", size="medium", color="brown", id=0>, #<struct Dog breed="poodle", size="medium", color=nil, id=1>]
D, [2022-05-29T23:24:09.056204 #43664] DEBUG -- : Querying database for: {:size=>"small", :color=>"brown"}
D, [2022-05-29T23:24:09.056285 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="terrier", size="small", color="brown", id=0>, #<struct Dog breed="beagle", size="medium", color="brown", id=1>, #<struct Dog breed="pug", size="small", color=nil, id=2>, #<struct Dog breed="beagle", size="medium", color="brown", id=3>, #<struct Dog breed="schnauzer", size="small", color="grey", id=4>, #<struct Dog breed="schnauzer", size="small", color="grey", id=5>]
D, [2022-05-29T23:24:09.056311 #43664] DEBUG -- : Querying database for: {:breed=>"poodle"}
D, [2022-05-29T23:24:09.056357 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="poodle", size="medium", color="white", id=0>, #<struct Dog breed="poodle", size="medium", color="white", id=1>]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.