简体   繁体   English

如何使用ActiveRecord和Postgresql按列选择唯一记录

[英]How do I select unique records by column with ActiveRecord and Postgresql

Given the following records (the first row being the column names): 给出以下记录(第一行是列名):

name              platform           other_columns     date
Eric              Ruby               something         somedate
Eric              Objective-C        something         somedate
Joe               Ruby               something         somedate

How do I retrieve a singular record with all columns , such that the name column is always unique in the results set? 如何检索包含所有列的单数记录,以使名称列在结果集中始终是唯一的? I would like the query in this example to return the first Eric (w/ Ruby) record. 我想在这个例子中的查询返回第一个Eric(w / Ruby)记录。

I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column. 我认为我最接近的是使用“select distinct on(name)* ...”,但这需要我先按名称排序,当我真的想按日期列排序记录时。

  • Order records by date 按日期订购记录
  • If there are multiple records with the same name, select one (which does not matter) 如果有多个具有相同名称的记录,请选择一个(这无关紧要)
  • Select all columns 选择所有列

How do I achieve this in Rails on PostgreSQL? 我如何在PostgreSQL上的Rails中实现这一点?

You can't do a simple .group(:name) because that produces a GROUP BY name in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains : 你不能做一个简单的.group(:name)因为当你选择未分组和未分页的列时,会在你的SQL中产生GROUP BY name ,这使得选择哪一行和PostgreSQL(正确的恕我直言)抱怨不明确

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column. 当存在GROUP BY时,SELECT列表表达式无法引用除聚合函数之外的未组合列,因为对于未组合列,将返回多个可能的值。

If you start adding more columns to your grouping with something like this: 如果您开始使用以下内容向分组中添加更多列:

T.group(T.columns.collect(&:name))

then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. 那么你将按照你不想要的东西进行分组,你最终会拉出整张桌子,这不是你想要的。 If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (ie one column will come from one row while another column will come from some other row) and that's not what you want either. 如果你尝试聚合以避免分组问题,你最终会混合不同的行(即一列将来自一行,而另一列将来自其他行),这也不是你想要的。

ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort. ActiveRecord确实不是为这类东西而构建的,但你可以通过一些努力将它弯曲到你的意志。

You're using AR so you presumably have an id column. 你正在使用AR,所以你可能有一个id列。 If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; 如果你有PostgreSQL 8.4或更高版本,那么你可以使用窗口函数作为一种本地化的GROUP BY; you'll need to window twice: once to figure out the name / thedate pairs and again to pick just one id (just in case you have multiple rows with the same name and thedate which match the earliest thedate ) and hence get a unique row: 你需要窗口两次:一次搞清楚name / thedate对,再挑出一个id (以防万一你有相同的多行namethedate匹配最早其中thedate ),从而得到一个唯一的行:

select your_table.*
from your_table
where id in (
    -- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
    select min(yt.id) over (partition by yt.name)
    from (
        select distinct name, min(thedate) over (partition by name) as thedate
        from your_table
    ) as dt
    join your_table as yt
      on yt.name = dt.name and yt.thedate = dt.thedate
)

Then wrap that in a find_by_sql and you have your objects. 然后将它包装在find_by_sql ,你就拥有了你的对象。

If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. 如果您将Heroku与共享数据库(或其他没有8.4或更高版本的环境)一起使用,那么您将无法使用PostgreSQL 8.3并且您将无法使用窗口功能。 In that case, you'd probably want to filter out the duplicates in Ruby-land: 在这种情况下,您可能希望过滤掉Ruby-land中的重复项:

with_dups = YourTable.find_by_sql(%Q{
    select yt.*
    from your_table yt
    join (select name, min(thedate) as thedate from your_table group by name) as dt
      on yt.name = dt.name and yt.thedate = dt.thedate
});

# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }

If you're pretty sure that there won't be duplicate name / min(thedate) pairs then the 8.3-compatible solution might be your best bet; 如果您非常确定不会有重复的name / min(thedate)对,那么8.3兼容的解决方案可能是您最好的选择; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away. 但是,如果会有很多重复项,那么您希望数据库尽可能多地完成工作,以避免创建数千个您将要丢弃的AR对象。

Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer. 也许其他人比我更强大的PostgreSQL-Fu会出现并提供更好的东西。

I you don't care for which row is retrieved when multiple names are there (this will be true for all columns) and the table has that structure you can simply do a query like 我不关心当多个名称存在时检索哪一行(对于所有列都是如此)并且表具有该结构,您可以简单地执行查询

SELECT * FROM table_name GROUP BY `name` ORDER BY `date`

or in Rails 或者在Rails中

TableClass.group(:name).order(:date)

I know this question is 8 years old. 我知道这个问题是8岁。 Current ruby version is 2.5.3 . 目前的红宝石版本是2.5.3 2.6.1 is released. 2.6.1发布。 Rails stable version is 5.2.2 . Rails稳定版本是5.2.2 6.0.0 beta2 is released. 6.0.0 beta2发布。

Lets name your table Person . 让我们命名表Person

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Explanation : First get all records in person table. 说明 :首先获取人员表中的所有记录。 Then sorted by date (descending or ascending) and then group by name (record with duplicate name will be grouped). 然后按日期(降序或升序)排序,然后按名称分组(具有重复名称的记录将被分组)。

Person.all.order(:date).group_by(&:name)

This returns hash. 这会返回哈希值。

{"Eric" => [#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">], "Joe" => [#<Person id: 3, name: "Joe", other_fields: "">]}

Solution 1: .map method. 解决方案1: .map方法。

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

We got hash. 我们得到哈希。 We loop that as array. 我们将其作为数组循环。 p.last will give p.last会给

[[#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">],[#<Person id: 3, name: "Joe", other_fields: "">]]

Get first or last record of nested array using p.last.first or p.last.last . 使用p.last.firstp.last.last获取嵌套数组的第一个或最后一个记录。

Solution 2: .collect or .each method. 解决方案2: .collect.each方法。

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Get a list of names and minimum dates, and join that back to the original table to get the rowset you're looking for. 获取名称和最短日期列表,然后将其连接回原始表格以获取您正在寻找的行集。

select
    b.*
from
    (select name, min(date) as mindate from table group by name) a
    inner join table b
        on  a.name = b.name and a.mindate = b.date

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何更改 PostgreSQL 表并使列唯一? - How do I ALTER a PostgreSQL table and make a column unique? ActiveRecord:如何通过所有关联记录查找记录? - ActiveRecord: How do I find records by all of their associated records? 我怎样才能 select 行对应于 PostgreSQL 中另一列的最高值的唯一列值对? - How can I select rows corresponding to the unique pair of column values with the highest value of another column in PostgreSQL? 如何使用 SQLAlchemy 选择 PostgreSQL 系统列? - How do I select a PostgreSQL system column using SQLAlchemy? 使用activerecord时,如何根据字段的最高值选择记录? - with activerecord how can I select records based on the highest value of a field? 在PostgreSQL中交换具有UNIQUE约束的列的记录值 - Swapping records' values for a column with a UNIQUE constraint in PostgreSQL 如何选择一组唯一的随机记录? - How to select a set number of random records where one column is unique? 如何 select 仅记录在某个列中包含唯一值 - How to select only records that contain a unique value in a certain column 如何 select 限制数据库中不同列唯一值的记录 - How to select limit records from database for different unique values of column 如何从一个表中为另一个表中的每个唯一记录选择 100 条记录 - How do I select 100 records from one table for each unique record from another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM