简体   繁体   中英

How do I select unique records by column with ActiveRecord and Postgresql

Given the following records (the first row being the column names):

name              platform           other_columns     date
Eric              Ruby               something         somedate
Eric              Objective-C        something         somedate
Joe               Ruby               something         somedate

How do I retrieve a singular record with all columns , such that the name column is always unique in the results set? I would like the query in this example to return the first Eric (w/ Ruby) record.

I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column.

  • Order records by date
  • If there are multiple records with the same name, select one (which does not matter)
  • Select all columns

How do I achieve this in Rails on PostgreSQL?

You can't do a simple .group(:name) because that produces a GROUP BY name in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains :

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

If you start adding more columns to your grouping with something like this:

T.group(T.columns.collect(&:name))

then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (ie one column will come from one row while another column will come from some other row) and that's not what you want either.

ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort.

You're using AR so you presumably have an id column. If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; you'll need to window twice: once to figure out the name / thedate pairs and again to pick just one id (just in case you have multiple rows with the same name and thedate which match the earliest thedate ) and hence get a unique row:

select your_table.*
from your_table
where id in (
    -- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
    select min(yt.id) over (partition by yt.name)
    from (
        select distinct name, min(thedate) over (partition by name) as thedate
        from your_table
    ) as dt
    join your_table as yt
      on yt.name = dt.name and yt.thedate = dt.thedate
)

Then wrap that in a find_by_sql and you have your objects.

If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. In that case, you'd probably want to filter out the duplicates in Ruby-land:

with_dups = YourTable.find_by_sql(%Q{
    select yt.*
    from your_table yt
    join (select name, min(thedate) as thedate from your_table group by name) as dt
      on yt.name = dt.name and yt.thedate = dt.thedate
});

# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }

If you're pretty sure that there won't be duplicate name / min(thedate) pairs then the 8.3-compatible solution might be your best bet; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away.

Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer.

I you don't care for which row is retrieved when multiple names are there (this will be true for all columns) and the table has that structure you can simply do a query like

SELECT * FROM table_name GROUP BY `name` ORDER BY `date`

or in Rails

TableClass.group(:name).order(:date)

I know this question is 8 years old. Current ruby version is 2.5.3 . 2.6.1 is released. Rails stable version is 5.2.2 . 6.0.0 beta2 is released.

Lets name your table Person .

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Explanation : First get all records in person table. Then sorted by date (descending or ascending) and then group by name (record with duplicate name will be grouped).

Person.all.order(:date).group_by(&:name)

This returns hash.

{"Eric" => [#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">], "Joe" => [#<Person id: 3, name: "Joe", other_fields: "">]}

Solution 1: .map method.

Person.all.order(:date).group_by(&:name).map{|p| p.last.last}

We got hash. We loop that as array. p.last will give

[[#<Person id: 1, name: "Eric", other_fields: "">, #<Person id: 2, name: "Eric", other_fields: "">],[#<Person id: 3, name: "Joe", other_fields: "">]]

Get first or last record of nested array using p.last.first or p.last.last .

Solution 2: .collect or .each method.

Person.all.order(:date).group_by(&:name).collect {|key, value| value.last}

Get a list of names and minimum dates, and join that back to the original table to get the rowset you're looking for.

select
    b.*
from
    (select name, min(date) as mindate from table group by name) a
    inner join table b
        on  a.name = b.name and a.mindate = b.date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM