Ruby 2.7：如何合并散列数组的散列并基于一个键消除重复项：值

Question

I'm trying to complete a project-based assessment for a job interview, and they only offer it in Ruby on Rails, which I know little to nothing about.我正在尝试为工作面试完成基于项目的评估，他们只在 Ruby on Rails 中提供它，我对此知之甚少。 I'm trying to take one hash that contains two or more hashes of arrays and combine the arrays into one array of hashes, while eliminating duplicate hashes based on an "id":value pair.我正在尝试采用一个包含两个或多个数组散列的散列并将这些数组组合成一个散列数组，同时消除基于“id”：值对的重复散列。

So I'm trying to take this:所以我试图接受这个：

h = {
  'first' =>
      [
        { 'authorId' => 12, 'id' => 2, 'likes' => 469 },
        { 'authorId' => 5, 'id' => 8, 'likes' => 735 },
        { 'authorId' => 8, 'id' => 10, 'likes' => 853 }
      ],
  'second' =>
      [
        { 'authorId' => 9, 'id' => 1, 'likes' => 960 },
        { 'authorId' => 12, 'id' => 2, 'likes' => 469 },
        { 'authorId' => 8, 'id' => 4, 'likes' => 728 }
      ]
}

And turn it into this:并把它变成这样：

[
  { 'authorId' => 12, 'id' => 2, 'likes' => 469 },
  { 'authorId' => 5, 'id' => 8, 'likes' => 735 },
  { 'authorId' => 8, 'id' => 10, 'likes' => 853 },
  { 'authorId' => 9, 'id' => 1, 'likes' => 960 },
  { 'authorId' => 8, 'id' => 4, 'likes' => 728 }

]

Answer 1

Ruby has many ways to achieve this. Ruby 有很多方法可以实现这一点。

My first instinct is to group them by id it and pick only first item from the array.我的第一反应是按id对它们进行分组，然后从数组中只选择第一项。

h.values.flatten.group_by{|x| x["id"]}.map{|k,v| v[0]}

Much cleaner approach is to pick the distinct item based on id after flattening the array of hash which is what Cary Swoveland suggested in the comments更简洁的方法是在展平哈希数组后根据 id 选择不同的项目，这就是Cary Swoveland在评论中建议的

h.values.flatten.uniq { |h| h['id'] }

Answer 2

TL;DR TL;博士

The simplest solution to the problem that fits the data you posted is h.values.flatten.uniq .适合您发布的数据的问题的最简单解决方案是h.values.flatten.uniq 。 You can stop reading here unless you want to understand why you don't need to care about duplicate IDs with this particular data set, or when you might need to care and why that's often less straightforward than it seems.您可以在此处停止阅读，除非您想了解为什么您不需要关心此特定数据集的重复 ID，或者您何时可能需要关心以及为什么这通常不像看起来那么简单。

Near the end I also mention some features of Rails that address edge cases that you don't need for this specific data.在接近尾声时，我还提到了 Rails 的一些特性，这些特性解决了您不需要处理这些特定数据的边缘情况。 However, they might help with other use cases.但是，它们可能有助于其他用例。

Skip ID-Specific Deduplication;跳过特定于 ID 的重复数据删除； Focus on Removing Duplicate Hashes* Instead*而是专注于删除重复的哈希

First of all, you have no duplicate id keys that aren't also part of duplicate Hash objects.首先，您没有不属于重复 Hash 对象的重复id键。 Despite the fact that Ruby implementations preserve entry order of Hash objects , a Hash is conceptually unordered.尽管 Ruby 实现保留了 Hash 对象的入口顺序，但 Hash 在概念上是无序的。 Pragmatically, that means two Hash objects with the same keys and values (even if they are in a different insertion order) are still considered equal.实际上，这意味着具有相同键和值的两个 Hash 对象（即使它们处于不同的插入顺序）仍然被认为是相等的。 So, perhaps unintuitively:所以，也许不直观：

{'authorId' => 12, 'id' => 2, 'likes' => 469} ==
  {'id' => 2, 'likes' => 469, 'authorId' => 12}
#=> true

Given your example input, you don't actually have to worry about unique IDs for this exercise.鉴于您的示例输入，您实际上不必担心此练习的唯一 ID。 You just need to eliminate duplicate Hash objects from your merged Array, and you have only one of those.您只需要从合并的 Array 中消除重复的 Hash 对象，而您只有其中一个。

duplicate_ids =
  h.values.flatten.group_by { _1['id'] }
    .reject { _2.one? }.keys
#=> [2]

unique_hashes_with_duplicate_ids =
  h.values.flatten.group_by { _1['id'] }
    .reject { _2.uniq.one? }.count
#=> 0

As you can see, 'id' => 2 is the only ID found in both Hash values, albeit in identical Hash objects.如您所见， 'id' => 2是在两个 Hash 值中找到的唯一 ID，尽管在相同的 Hash 对象中。 Since you have only one duplicate Hash, the problem has been reduced to flattening the Array of Hash values stored in h so that you can remove any duplicate Hash elements (not duplicate IDs) from the combined Array.由于您只有一个重复的 Hash，因此问题已简化为展平存储在h中的 Hash 值数组，以便您可以从组合数组中删除任何重复的 Hash 元素（而不是重复的 ID）。

Solution to the Posted Problem已发布问题的解决方案

There might be uses cases where you need to handle the uniqueness of Hash keys, but this is not one of them.可能存在需要处理哈希键唯一性的用例，但这不是其中之一。 Unless you want to sort your result by some key, all you really need is:除非您想按某个键对结果进行排序，否则您真正需要的是：

h.values.flatten.uniq

Since you aren't being asked to sort the Hash objects in your consolidated Array, you can avoid the need for another method call that (in this case, anyway) is a no-op.由于没有要求您对合并数组中的 Hash 对象进行排序，因此您可以避免需要另一个方法调用（在这种情况下，无论如何）是无操作的。

"Uniqueness" Can Be Tricky Absent Additional Context “独特性”在没有附加背景的情况下可能会很棘手

The only reason to look at your id keys at all would be if you had duplicate IDs in multiple unique Hash objects, and if that were the case you'd then have to worry about which Hash was the correct one to keep.查看您的id键的唯一原因是，如果您在多个唯一Hash 对象中有重复的 ID，如果是这种情况，您将不得不担心要保留哪个 Hash 是正确的。 For example, given:例如，给定：

[ {'id' => 1, 'authorId' => 9, 'likes' => 1_920},
  {'id' => 1, 'authorId' => 9, 'likes' => 960} ]

which one of these records is the "duplicate" one?这些记录中的哪一项是“重复”的？ Without other data such as a timestamp, simply chaining uniq { h['id' } or merging the Hash objects will either net you the first or last record respectively.如果没有时间戳等其他数据，只需链接uniq { h['id' }或合并 Hash 对象将分别为您提供第一条或最后一条记录。 Consider:考虑：

[
  {'id' => 1, 'authorId' => 9, 'likes' => 1_920},
  {'id' => 1, 'authorId' => 9, 'likes' => 960}
].uniq { _1['id'] }
#=> [{"id"=>1, "authorId"=>9, "likes"=>1920}]

[
  {'id' => 1, 'authorId' => 9, 'likes' => 1_920},
  {'id' => 1, 'authorId' => 9, 'likes' => 960}
].reduce({}, :merge)
#=> {"id"=>1, "authorId"=>9, "likes"=>960}

Leveraging Context Like Rails-Specific Timestamp Features利用像 Rails 特定的时间戳功能一样的上下文

While the uniqueness problem described above may seem out of scope for the question you're currently being asked, understanding the limitations of any kind of data transformation is useful.虽然上面描述的唯一性问题似乎超出了您当前被问到的问题的范围，但了解任何类型的数据转换的局限性都是有用的。 In addition, knowing that Ruby on Rails supportsActiveRecord::Timestamp and the creation and management of timestamp-related columns within database migrations may be highly relevant in a broader sense.此外，了解 Ruby on Rails 支持ActiveRecord::Timestamp以及在数据库迁移中创建和管理与时间戳相关的列可能在更广泛的意义上是高度相关的。

You don't need to know these things to answer the original question.你不需要知道这些事情来回答最初的问题。 However, knowing when a given solution fits a specific use case and when it doesn't is important too.但是，了解给定解决方案何时适合特定用例以及何时不适合也很重要。

Ruby 2.7：如何合并散列数组的散列并基于一个键消除重复项：值

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-05-28 02:18:28

解决方案2
0 2022-05-28 06:23:26

TL;DR TL;博士

Skip ID-Specific Deduplication;跳过特定于 ID 的重复数据删除； Focus on Removing Duplicate Hashes* Instead*而是专注于删除重复的哈希

Solution to the Posted Problem已发布问题的解决方案

"Uniqueness" Can Be Tricky Absent Additional Context “独特性”在没有附加背景的情况下可能会很棘手

Leveraging Context Like Rails-Specific Timestamp Features利用像 Rails 特定的时间戳功能一样的上下文

Ruby 2.7：如何合并散列数组的散列并基于一个键消除重复项：值

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-05-28 02:18:28

解决方案2 0 2022-05-28 06:23:26

TL;DR TL;博士

Skip ID-Specific Deduplication;跳过特定于 ID 的重复数据删除； Focus on Removing Duplicate Hashes Instead而是专注于删除重复的哈希

Solution to the Posted Problem已发布问题的解决方案

"Uniqueness" Can Be Tricky Absent Additional Context “独特性”在没有附加背景的情况下可能会很棘手

Leveraging Context Like Rails-Specific Timestamp Features利用像 Rails 特定的时间戳功能一样的上下文

解决方案1
2 已采纳 2022-05-28 02:18:28

解决方案2
0 2022-05-28 06:23:26

Skip ID-Specific Deduplication;跳过特定于 ID 的重复数据删除； Focus on Removing Duplicate Hashes* Instead*而是专注于删除重复的哈希