简体   繁体   中英

How to append ID key:value pairs when iterating over an array of hashes in Ruby?

I have an array of hashes, eg:

{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=nil}

I want to give them all ID values based on a given search criteria. For example, searching by Size should return:

{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"=nil, "ID"="0"}

I have written the following code which checks each hash in the sequence with the following hashes.

for i in 0..data.length-2
  data[i].store("ID", i)

  for j in i+1..data.length-1
    output = (data[i].keys & data[j].keys).select { |k| data[i][k] == data[j][k] }

    if output.include? searchTerm
      puts "Match!"
      puts "---"
      data[j].store("ID", data[i]["ID"])
    else
      puts "No match :("
      puts "---"
    end
  end

  puts "---Finished checking row---"    
end

puts data

The issue is twofold:

A. Nil values count as a match, eg when searching by Colour:

{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"=nil, "ID"="1"}

B. Matches seem to only work for the last pair found, eg when searching by Size:

{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="0"}
{"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil, "ID"="1"}
{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown", "ID"="2"}
{"Breed"=>"Poodle", "Size"=>"Medium", "Colour"="White", "ID"="2"}

In summary, I want to ignore nil values so they don't count as matches and for all instances of the same key:value pair to have the same value for the ID key.

Don't use for loops for array iteration. Ruby has plenty of nice methods for doing that.

Here is my solution:

hashes = [{"Breed" => "Beagle", "Size" => "Medium", "Colour" => "Brown"},
          {"Breed" => "Pug", "Size" => "Small", "Colour" => nil},
          {"Breed" => "Beagle", "Size" => "Medium", "Colour" => "Brown"},
          {"Breed" => "Beagle", "Size" => "Large", "Colour" => nil}]

def search_id(elements, search_key)
  # Get rid of elements with nil values.
  target_elements = elements.reject {|e| e[search_key].nil?}

  resul = []
  id = 0

  # Iterate through target_elements
  target_elements.each do |currentElement|

    # Check if exsits an element with the same value in the resul array
    match = resul.find {|previousElement| currentElement[search_key] == previousElement[search_key]}

    if match
      # Use previous id
      currentElement[:id] = match[:id]
    else
      # Assing a new id
      currentElement[:id] = id
      id += 1
    end

    # Add element to result
    resul << currentElement
  end

  resul
end

puts search_id(hashes, 'Size')

I think that it might be worth considering grabbing all of the unique values from the searched entries first and then assigning each of them an id inside a hash (with the available terms as the keys), so that you can simply loop though the hashes and grab the correct id from the pre-generated hash. eg something like:

hashes = [
    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"},
    {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil},
    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"},
    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil}
]


def append_id(items, search_key)
    id_map = items.
        map { |hash| hash[search_key] }.
        uniq.
        reject { |value| value.nil? }.
        map.with_index { |value, index| [value, index] }.
        to_h
    
    items.each do |hash|
        value = hash[search_key]
        hash['ID'] = id_map[value] unless value.nil?
    end
    
    items
end

append_id(hashes, 'Size')

Here is one way to do that.

def doit(arr, key)
  val_to_id = arr.map { |h| h[key] }
                 .uniq
                 .each_with_index.with_object({}) { |(key,i),h| h[key] = i.to_s }
  arr.map { |h| h.merge("ID"=>val_to_id[h[key]]) }
end

arr = [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"},
       {"Breed"=>"Pug", "Size"=>"Small", "Colour"=nil},
       {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"="Brown"},
       {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=nil}]

See Array#map , Array#uniq , Enumerable#each_with_index and Enumerator#with_object . The block variables ( |(key,i),h| ) are written to make use of Array decomposition .


doit(arr,"Breed")
  #=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]
doit(arr,"Size")
  #=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]
doit(arr, "Colour")
  #=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"1"}]

The steps are as follows.

key = "Size"
a = arr.map { |h| h[key] }
  #=> ["Medium", "Small", "Medium", "Medium"]
b = a.uniq
  #=> ["Medium", "Small"]
val_to_id = b.each_with_index.with_object({}) { |(key,i),h| h[key] = i.to_s }
  #=> {"Medium"=>"0", "Small"=>"1"}
arr.map { |h| h.merge("ID"=>val_to_id[h[key]]) }
  #=> [{"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Pug", "Size"=>"Small", "Colour"=>nil, "ID"=>"1"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"},
  #    {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>nil, "ID"=>"0"}]

Let's consider the last calculation in more detail. The first value of arr is passed to the block, the block variable h is assigned its value and the block calculations are performed with the result returned to map .

h = arr[0]
  #=> {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown"}
c = h[key]
  #=> "Medium"
d = val_to_id[c]
  #=> "Medium"
h.merge("ID"=>d)
  #=> {"Breed"=>"Beagle", "Size"=>"Medium", "Colour"=>"Brown", "ID"=>"0"}

The remaining elements of arr are processed similarly.

Preface

As mentioned in the comments on this and other related posts coming from students of the same instructor, this is simply a bad exercise. Appending values to a Hash or Struct isn't hard. The X/Y problem you have is that you're trying to simulate things like joins, indexed keys, and intermediate tables that are built-in features of a database. Reinventing the wheel is generally a bad idea.

That said, if you can make some basic guarantees about your data such as the fact that :breed will always have a value, and that you can in fact merge values from different keys without clobbering important data or invalidating the whole notion of null values in database fields, then you can come pretty close to what you want. There are still a lot of edge cases because the data is idiopathic, not normalized, and the notion of giving similar-but-different records the same :id (whether or not you've merged them) seems problematic.

Despite all those caveats, here's a working solution that should address both your actual and underlying questions, as well as the related ones one of your colleagues posted. I'm not going to explain every line of code, but I documented it fairly thoroughly so you could experiment with it.

A Working Dog "Database"

Application Code

#!/usr/bin/env ruby

require 'logger'

Dog = Struct.new :breed, :size, :color, :id

class ReinventingTheDatabase
  # Here's a semi-randomized "database" of sample data, which is exercised by the self-tests at the bottom of the code.
  #
  # @note No attempt is made to handle the use case of a missing breed, which is used elsewhere to sort and merge in a
  #   sensible way.
  # @return [Array<Struct>] populated by Dog objects for use in data lookups and merging
  DOG_DATABASE = [
      ['terrier', 'small', 'brown'],
      ['beagle', 'medium', 'brown'],
      ['pug', 'small', nil],
      ['beagle', nil, 'brown'],
      ['beagle', 'medium', 'brown'],
      ['beagle', 'medium', nil],
      ['poodle', 'medium', nil],
      ['schnauzer', 'small', 'grey'],
      ['poodle', nil, 'white'],
      ['schnauzer', 'small', nil],
  ].map { Dog.new *_1 }.freeze

  # @attr_reader dogs [Array<Array>] carries a copy of the DOG_DATABASE constant converted to a Dog struct for data lookups
  # @attr_reader last_query [Hash] holds the current database query for the instance
  # @attr_reader last_query_result [Array<Struct>] the complete results of the data lookup
  # @attr_reader last_merge [Array<Struct>] routinely updated as the @last_query_result is merged and manipulated
  attr_reader *%i[dogs last_query last_query_result last_merge logger]

  # @note You can get debugging output by adding +LOG_LEVEL='debug'+ to your environment, or tweak ENV in your REPL.
  # @param database [Array<Struct>, #members] values to populate your pseudo-database with; must
  #   +:respond_to?(:members)+ or stuff will break
  # @param logfile [String, StringIO] destination compatible with Ruby's standard library Logger
  # @return [void] instantiate your faux database class
  def initialize database=DOG_DATABASE, logfile: $stderr
    @dogs = database.dup
    @logger = Logger.new logfile, level: ENV.fetch('LOG_LEVEL', :info)
  end

  # Public interface for the class.
  #
  # @param kwargs [Hash, Array<Hash>] one or more keyword arguments that are members of members of +Dog < Struct+ class
  #   to use as search terms
  # @see Dog
  # @return [Array<Struct>] merged and tagged results of the data lookup and munging
  def query **kwargs
    @last_query = kwargs
    @logger.debug "Querying database for: #{@last_query}"
    query_result = db_query
    @logger.debug "Final result after merging results: #{@last_merge}"
    printf "Query: %{kwargs}\nResult: %{result}\n\n", {kwargs: kwargs, result: db_query.map(&:to_h)} unless
      @logger.debug?
    query_result
  end

  private

  # @return [Array<Struct>] matching elements that have been merged and tagged with an :id attribute
  def db_query
    rows = []
    @dogs.map { |row| @last_query.map { rows.push(row) if row[_1] == _2 } }
    @last_merge = (@last_query_result = rows).dup
    merge_struct_members
    insert_ids_into_last_merge
  end

  # @return [Array<Struct>] +@last_merge+ including :id attributes
  def insert_ids_into_last_merge
    idx = -1
    @last_merge.uniq!
    @last_merge.map! { _1.id = idx+=1; _1 }
  end

  # Merge Struct members that are nil when possible.
  #
  # @param m [Symbol] Dog#member
  # @return [Array<Struct>] modified +@last_query_result+
  def merge_struct_members
    Dog.members[0..-2].map do |m|
      @last_merge.uniq.sort_by { _1.breed }.each_cons(2).map do |row1, row2|
        cons_rows = [row1[m], row2[m]].compact
        row1[m], row2[m] = cons_rows * 2 if cons_rows.one? && row1.breed == row2.breed
      end
    end
  end
end

if __FILE__ == $0
  db = ReinventingTheDatabase.new
  db.query size: 'small'
  db.query size: 'medium'
  db.query size: 'small', color: 'brown'
  db.query breed: 'poodle'
end

# vim: ft=ruby et sw=2 tw=120

Console Output

When run from the shell with ruby dogs.rb (or ./dogs.rb if it's executable) you should get the following output:

Query: {:size=>"small"}
Result: [{:breed=>"terrier", :size=>"small", :color=>"brown", :id=>0}, {:breed=>"pug", :size=>"small", :color=>nil, :id=>1}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>2}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>3}]

Query: {:size=>"medium"}
Result: [{:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>0}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>1}, {:breed=>"poodle", :size=>"medium", :color=>nil, :id=>2}]

Query: {:size=>"small", :color=>"brown"}
Result: [{:breed=>"terrier", :size=>"small", :color=>"brown", :id=>0}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>1}, {:breed=>"pug", :size=>"small", :color=>nil, :id=>2}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>3}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>4}, {:breed=>"beagle", :size=>"medium", :color=>"brown", :id=>5}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>6}, {:breed=>"schnauzer", :size=>"small", :color=>"grey", :id=>7}]

Query: {:breed=>"poodle"}
Result: [{:breed=>"poodle", :size=>"medium", :color=>"white", :id=>0}, {:breed=>"poodle", :size=>"medium", :color=>"white", :id=>1}]

If you run it with debug logging enabled, you'll get more visibility. For example, when invoked as LOG_LEVEL='debug' ruby dogs.rb you should see:

D, [2022-05-29T23:24:09.055928 #43664] DEBUG -- : Querying database for: {:size=>"small"}
D, [2022-05-29T23:24:09.056087 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="terrier", size="small", color="brown", id=0>, #<struct Dog breed="pug", size="small", color=nil, id=1>, #<struct Dog breed="schnauzer", size="small", color="grey", id=2>]
D, [2022-05-29T23:24:09.056120 #43664] DEBUG -- : Querying database for: {:size=>"medium"}
D, [2022-05-29T23:24:09.056178 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="beagle", size="medium", color="brown", id=0>, #<struct Dog breed="poodle", size="medium", color=nil, id=1>]
D, [2022-05-29T23:24:09.056204 #43664] DEBUG -- : Querying database for: {:size=>"small", :color=>"brown"}
D, [2022-05-29T23:24:09.056285 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="terrier", size="small", color="brown", id=0>, #<struct Dog breed="beagle", size="medium", color="brown", id=1>, #<struct Dog breed="pug", size="small", color=nil, id=2>, #<struct Dog breed="beagle", size="medium", color="brown", id=3>, #<struct Dog breed="schnauzer", size="small", color="grey", id=4>, #<struct Dog breed="schnauzer", size="small", color="grey", id=5>]
D, [2022-05-29T23:24:09.056311 #43664] DEBUG -- : Querying database for: {:breed=>"poodle"}
D, [2022-05-29T23:24:09.056357 #43664] DEBUG -- : Final result after merging results: [#<struct Dog breed="poodle", size="medium", color="white", id=0>, #<struct Dog breed="poodle", size="medium", color="white", id=1>]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM