简体   繁体   中英

Ruby - sort array of hashes values (string) based on array order

I have an array of hashes in the format shown below and I am attempting to sort the :book key of the hash based on a separate array. The order is not alphabetical and for my use case it cannot be alphabetical.

I need to sort based on the following array:

array = ['Matthew', 'Mark', 'Acts', '1John']

Note that I've seen several solutions that leverage Array#index (such as Sorting an Array of hashes based on an Array of sorted values ) to perform a custom sort but that will not work with strings.

I've tried various combinations of sorting with Array#sort and Array#sort_by but they don't seem to accept a custom order. What am I missing? Thank you in advance for your help!

Array of Hashes

[{:book=>"Matthew",
  :chapter=>"4",
  :section=>"new_testament"},
 {:book=>"Matthew",
  :chapter=>"22",
  :section=>"new_testament"},
 {:book=>"Mark",
  :chapter=>"6",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"9",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"17",
  :section=>"new_testament"}]

You can use sort_by with index

arr = [{a: 1}, {a: 3}, {a: 2}] 

order = [2,1,3]  

arr.sort_by { |elem| order.index(elem[:a]) }                                           
# => [{:a=>2}, {:a=>1}, {:a=>3}]  

You can make it slightly faster by indexing the list of elements you want to order by:

order_with_index = order.each.with_object.with_index({}) do |(elem, memo), idx|
  memo[elem] = idx
end

then instead of order.index(<val>) use order_with_index[<val>]

As can be seen from the documentation , Array#index indeed does work for strings (it's even the provided example), so this would work:

books.sort_by { |b| array.index(b[:book]) }

But instead of repeatedly searching through array , you can just determine the order once and then look it up:

order = array.each.with_index.to_h
#=> { "Matthew" => 0, "Mark" => 1, "Acts" => 2, "1John" => 3 }
books.sort_by { |b| order[b[:book]] }

Since you know the desired order there's no need to sort the array. Here's one way you could do that. (I've called your array of hashes bible .)

bible.group_by { |h| h[:book] }.values_at(*array).flatten
  #=> [{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #    {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"},
  #    {:book=>"Mark", :chapter=>"6", :section=>"new_testament"},
  #    {:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
  #    {:book=>"Acts", :chapter=>"17", :section=>"new_testament"},
  #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #    {:book=>"1John", :chapter=>"1", :section=>"new_testament"}] 

Since Enumerable#group_by , Hash#values_at and Array#flatten each require just one pass through the array bible this may be faster than sorting when bible is large.

Here are the steps.

h = bible.group_by { |h| h[:book] }
  #=> {"Matthew"=>[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #                {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
  #    "Mark"   =>[{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
  #    "1John"  =>[{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #                {:book=>"1John", :chapter=>"1", :section=>"new_testament"}],
  #    "Acts"   =>[{:book=>"Acts", :chapter=>"9", :section=>"new_testament"}, 
  #                {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}]
  #   } 

a = h.values_at(*array)
  #=> h.values_at('Matthew', 'Mark', 'Acts', '1John')
  #=> [[{:book=>"Matthew", :chapter=>"4", :section=>"new_testament"},
  #     {:book=>"Matthew", :chapter=>"22", :section=>"new_testament"}],
  #    [{:book=>"Mark", :chapter=>"6", :section=>"new_testament"}],
  #    [{:book=>"Acts", :chapter=>"9", :section=>"new_testament"},
  #     {:book=>"Acts", :chapter=>"17", :section=>"new_testament"}],
  #    [{:book=>"1John", :chapter=>"1", :section=>"new_testament"},
  #     {:book=>"1John", :chapter=>"1", :section=>"new_testament"}]] 

Lastly, a.flatten returns the array shown earlier.

Let's do a benchmark.

require 'fruity'

@bible = [
  {:book=>"Matthew",
   :chapter=>"4",
   :section=>"new_testament"},
  {:book=>"Matthew",
   :chapter=>"22",
   :section=>"new_testament"},
  {:book=>"Mark",
   :chapter=>"6",
   :section=>"new_testament"},
  {:book=>"1John",
   :chapter=>"1",
   :section=>"new_testament"},
  {:book=>"1John",
   :chapter=>"1",
   :section=>"new_testament"},
  {:book=>"Acts",
   :chapter=>"9",
   :section=>"new_testament"},
  {:book=>"Acts",
   :chapter=>"17",
   :section=>"new_testament"}]

@order = ['Matthew', 'Mark', 'Acts', '1John']

def bench_em(n)
  arr = (@bible*((n/@bible.size.to_f).ceil))[0,n].shuffle
  puts "arr contains #{n} elements"
  compare do 
    _sort       { arr.sort { |h1,h2| @order.index(h1[:book]) <=>
                  @order.index(h2[:book]) }.size }
    _sort_by    { arr.sort_by { |h| @order.find_index(h[:book]) }.size }
    _sort_by_with_hash {ord=@order.each.with_index.to_h;
                        arr.sort_by {|b| ord[b[:book]]}.size}    
    _values_at  { arr.group_by { |h| h[:book] }.values_at(*@order).flatten.size }
  end
end

@maxpleaner, @ChaitanyaKale and @Michael Kohl contributed _sort , _sort_by , and sort_by_with_hash , respectively.

bench_em    100
arr contains 100 elements
Running each test 128 times. Test will take about 1 second.
_sort_by is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _values_at
_values_at is faster than _sort by 2x ± 1.0

bench_em  1_000
arr contains 1000 elements
Running each test 16 times. Test will take about 1 second.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

bench_em 10_000
arr contains 10000 elements
Running each test once. Test will take about 1 second.
_values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
_sort_by_with_hash is faster than _sort_by by 10.000000000000009% ± 10.0%
_sort_by is faster than _sort by 2x ± 0.1

bench_em 100_000
arr contains 100000 elements
Running each test once. Test will take about 3 seconds.
_values_at is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

Here's a second run.

bench_em    100
arr contains 100 elements
Running each test 128 times. Test will take about 1 second.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

bench_em  1_000
arr contains 1000 elements
Running each test 8 times. Test will take about 1 second.
_values_at is faster than _sort_by_with_hash by 10.000000000000009% ± 10.0%
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2.2x ± 0.1

bench_em 10_000
arr contains 10000 elements
Running each test once. Test will take about 1 second.
_values_at is similar to _sort_by_with_hash
_sort_by_with_hash is similar to _sort_by
_sort_by is faster than _sort by 2x ± 1.0

bench_em 100_000
arr contains 100000 elements
Running each test once. Test will take about 3 seconds.
_sort_by_with_hash is similar to _values_at
_values_at is similar to _sort_by
_sort_by is faster than _sort by 2x ± 0.1

As the description of Array#sort_by accepts a block. The block should return -1, 0, or +1 depending on the comparison between a and b. You can use find_index on the array to do such comparison.

array_of_hashes.sort_by {|a| array.find_index(a[:book]) } array_of_hashes.sort_by {|a| array.find_index(a[:book]) } should do the trick.

Your mistake is to think that you are sorting. But, in fact, you are not, you already have the order, you just need to place the elements. I'm not proposing a compact or optimal solution, but a simple solution. First convert your large array into a hash indexed by the :book key (which should have been your first data structure), and then just use map :

array = ['Matthew', 'Mark', 'Acts', '1John']
elements = [{:book=>"Matthew",
  :chapter=>"4",
  :section=>"new_testament"},
 {:book=>"Matthew",
  :chapter=>"22",
  :section=>"new_testament"},
 {:book=>"Mark",
  :chapter=>"6",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"1John",
  :chapter=>"1",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"9",
  :section=>"new_testament"},
 {:book=>"Acts",
  :chapter=>"17",
  :section=>"new_testament"}]
by_name = {}
for e in elements
  by_name[e[:book]] = e
end
final = array.map { |x| by_name[x] }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM