如何使用ActiveRecord和Postgresql按列选择唯一记录

Question

Given the following records (the first row being the column names): 给出以下记录（第一行是列名）：

name              platform           other_columns     date
Eric              Ruby               something         somedate
Eric              Objective-C        something         somedate
Joe               Ruby               something         somedate

How do I retrieve a singular record with all columns , such that the name column is always unique in the results set? 如何检索包含所有列的单数记录，以使名称列在结果集中始终是唯一的？ I would like the query in this example to return the first Eric (w/ Ruby) record. 我想在这个例子中的查询返回第一个Eric（w / Ruby）记录。

I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column. 我认为我最接近的是使用“select distinct on（name）* ...”，但这需要我先按名称排序，当我真的想按日期列排序记录时。

Order records by date 按日期订购记录
If there are multiple records with the same name, select one (which does not matter) 如果有多个具有相同名称的记录，请选择一个（这无关紧要）
Select all columns 选择所有列

How do I achieve this in Rails on PostgreSQL? 我如何在PostgreSQL上的Rails中实现这一点？

Answer 1

You can't do a simple .group(:name) because that produces a GROUP BY name in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains : 你不能做一个简单的.group(:name)因为当你选择未分组和未分页的列时，会在你的SQL中产生GROUP BY name ，这使得选择哪一行和PostgreSQL（正确的恕我直言）抱怨不明确：

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column. 当存在GROUP BY时，SELECT列表表达式无法引用除聚合函数之外的未组合列，因为对于未组合列，将返回多个可能的值。

If you start adding more columns to your grouping with something like this: 如果您开始使用以下内容向分组中添加更多列：

T.group(T.columns.collect(&:name))

then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. 那么你将按照你不想要的东西进行分组，你最终会拉出整张桌子，这不是你想要的。 If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (ie one column will come from one row while another column will come from some other row) and that's not what you want either. 如果你尝试聚合以避免分组问题，你最终会混合不同的行（即一列将来自一行，而另一列将来自其他行），这也不是你想要的。

ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort. ActiveRecord确实不是为这类东西而构建的，但你可以通过一些努力将它弯曲到你的意志。

You're using AR so you presumably have an id column. 你正在使用AR，所以你可能有一个id列。 If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; 如果你有PostgreSQL 8.4或更高版本，那么你可以使用窗口函数作为一种本地化的GROUP BY; you'll need to window twice: once to figure out the name / thedate pairs and again to pick just one id (just in case you have multiple rows with the same name and thedate which match the earliest thedate ) and hence get a unique row: 你需要窗口两次：一次搞清楚name / thedate对，再挑出一个id （以防万一你有相同的多行name和thedate匹配最早其中thedate ），从而得到一个唯一的行：

select your_table.*
from your_table
where id in (
    -- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
    select min(yt.id) over (partition by yt.name)
    from (
        select distinct name, min(thedate) over (partition by name) as thedate
        from your_table
    ) as dt
    join your_table as yt
      on yt.name = dt.name and yt.thedate = dt.thedate
)

Then wrap that in a find_by_sql and you have your objects. 然后将它包装在find_by_sql ，你就拥有了你的对象。

If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. 如果您将Heroku与共享数据库（或其他没有8.4或更高版本的环境）一起使用，那么您将无法使用PostgreSQL 8.3并且您将无法使用窗口功能。 In that case, you'd probably want to filter out the duplicates in Ruby-land: 在这种情况下，您可能希望过滤掉Ruby-land中的重复项：

with_dups = YourTable.find_by_sql(%Q{
    select yt.*
    from your_table yt
    join (select name, min(thedate) as thedate from your_table group by name) as dt
      on yt.name = dt.name and yt.thedate = dt.thedate
});

# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }

If you're pretty sure that there won't be duplicate name / min(thedate) pairs then the 8.3-compatible solution might be your best bet; 如果您非常确定不会有重复的name / min(thedate)对，那么8.3兼容的解决方案可能是您最好的选择; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away. 但是，如果会有很多重复项，那么您希望数据库尽可能多地完成工作，以避免创建数千个您将要丢弃的AR对象。

Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer. 也许其他人比我更强大的PostgreSQL-Fu会出现并提供更好的东西。

Answer 2

I you don't care for which row is retrieved when multiple names are there (this will be true for all columns) and the table has that structure you can simply do a query like 我不关心当多个名称存在时检索哪一行（对于所有列都是如此）并且表具有该结构，您可以简单地执行查询

SELECT * FROM table_name GROUP BY `name` ORDER BY `date`

or in Rails 或者在Rails中

TableClass.group(:name).order(:date)

Answer 3

I know this question is 8 years old. 我知道这个问题是8岁。 Current ruby version is 2.5.3 . 目前的红宝石版本是2.5.3 。 2.6.1 is released. 2.6.1发布。 Rails stable version is 5.2.2 . Rails稳定版本是5.2.2 。 6.0.0 beta2 is released. 6.0.0 beta2发布。

Lets name your table Person . 让我们命名表Person 。

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Explanation : First get all records in person table. 说明：首先获取人员表中的所有记录。 Then sorted by date (descending or ascending) and then group by name (record with duplicate name will be grouped). 然后按日期（降序或升序）排序，然后按名称分组（具有重复名称的记录将被分组）。

Person.all.order(:date).group_by(&:name)

This returns hash. 这会返回哈希值。

{"Eric" => [#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">], "Joe" => [#<Person id: 3, name: "Joe", other_fields: "">]}

Solution 1: .map method. 解决方案1： .map方法。

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

We got hash. 我们得到哈希。 We loop that as array. 我们将其作为数组循环。 p.last will give p.last会给

[[#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">],[#<Person id: 3, name: "Joe", other_fields: "">]]

Get first or last record of nested array using p.last.first or p.last.last . 使用p.last.first或p.last.last获取嵌套数组的第一个或最后一个记录。

Solution 2: .collect or .each method. 解决方案2： .collect或.each方法。

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Answer 4

Get a list of names and minimum dates, and join that back to the original table to get the rowset you're looking for. 获取名称和最短日期列表，然后将其连接回原始表格以获取您正在寻找的行集。

select
    b.*
from
    (select name, min(date) as mindate from table group by name) a
    inner join table b
        on  a.name = b.name and a.mindate = b.date

如何使用ActiveRecord和Postgresql按列选择唯一记录

问题描述

4 个解决方案

解决方案1
7 2011-07-27 19:07:16

解决方案2
2 2011-07-27 13:12:51

解决方案3
0 2019-02-26 04:01:23

解决方案4
0 2011-07-27 13:07:33

如何使用ActiveRecord和Postgresql按列选择唯一记录

问题描述

4 个解决方案

解决方案1 7 2011-07-27 19:07:16

解决方案2 2 2011-07-27 13:12:51

解决方案3 0 2019-02-26 04:01:23

解决方案4 0 2011-07-27 13:07:33

解决方案1
7 2011-07-27 19:07:16

解决方案2
2 2011-07-27 13:12:51

解决方案3
0 2019-02-26 04:01:23

解决方案4
0 2011-07-27 13:07:33