简体   繁体   中英

How to query the results of a query in rails (query the results of a 'DISTINCT ON' with rails & postgres

Short version: I'd like to query the result of another query, in order to select a more limited result set. However, adding a where clause rewrites the first query rather than work on the results, so I don't get the answers I need.

The detail: I have two models, checks and ticks. Checks has_many ticks.

The first query uses DISTINCT ON and gathers all of the 'checks' and all of the related ticks but only returns the most recent tick. I have that working as a scope in the model.

In my controller,

  def checklist
  #Filter the results by scope or return all checks with latest tick
  case params[:filter]
    when "duebylastresult"
      @checks = Check.mostrecenttickonly.duebylastresult
    when "duebydate"
      @checks = Check.mostrecenttickonly.duebydate
    else
      @checks = Check.mostrecenttickonly
    end
  end

In the model, the first scope (working):

scope :mostrecenttickonly, -> {
includes(:ticks)
.order("checks.id, ticks.created_at DESC")
.select("DISTINCT ON (checks.id) *").references(:ticks)
}

Generates the following SQL:

  Parameters: {"filter"=>""}
  SQL (1.0ms)  SELECT DISTINCT ON (checks.id) *, 
"checks"."id" AS t0_r0, 
"checks"."area" AS t0_r1, "checks"."frequency" AS t0_r2, 
"checks"."showinadvance" AS t0_r3, "checks"."category" AS t0_r4, 
"checks"."title" AS t0_r5, "checks"."description" AS t0_r6, 
"checks"."created_at" AS t0_r7, "checks"."updated_at" AS t0_r8, 
"ticks"."id" AS t1_r0, "ticks"."result" AS t1_r1, 
"ticks"."comments" AS t1_r2, "ticks"."created_at" AS t1_r3, 
"ticks"."updated_at" AS t1_r4, "ticks"."check_id" AS t1_r5 
FROM "checks" LEFT OUTER JOIN "ticks" 
ON "ticks"."check_id" = "checks"."id"  
ORDER BY checks.id, ticks.created_at DESC

Having got that result, I want to show only the ticks that have a value equal or greater than 3, so the scope:

   scope :duebylastresult, -> { where("ticks.result >= 3") }

Generates the SQL

  Parameters: {"filter"=>"duebylastresult"}
  SQL (1.0ms)  SELECT DISTINCT ON (checks.id) *, 
"checks"."id" AS t0_r0, 
"checks"."area" AS t0_r1, "checks"."frequency" AS t0_r2,
"checks"."showinadvance" AS t0_r3, "checks"."category" AS t0_r4, 
"checks"."title" AS t0_r5, "checks"."description" AS t0_r6, 
"checks"."created_at" AS t0_r7, "checks"."updated_at" AS t0_r8, 
"ticks"."id" AS t1_r0, "ticks"."result" AS t1_r1, 
"ticks"."comments" AS t1_r2, "ticks"."created_at" AS t1_r3, 
"ticks"."updated_at" AS t1_r4, "ticks"."check_id" AS t1_r5 
FROM "checks" LEFT OUTER JOIN "ticks" 
ON "ticks"."check_id" = "checks"."id" 
WHERE (ticks.result >= 3)  
ORDER BY checks.id, ticks.created_at DESC

As best I can tell, the WHERE statement is acting before the DISTINCT ON clause, so I now have the 'latest tick where the result is >= 3', whilst I'm looking for 'latest tick THEN only where the result is >= 3'.

Hope that makes sense & Thanks in advance!

Edit - Example of what I get and what I need:

The Data:
Table Checks:
ID: 98 Title: Eire
ID: 99 Title: Land

Table Ticks:
ID: 1 CheckID: 98 Result:1 Date: Jan12
ID: 2 CheckID: 98 Result:5 Date: Feb12
ID: 3 CheckID: 98 Result:1 Date: Mar12
ID: 4 CheckID: 99 Result:4 Date: Apr12

First query returns the most recent result, like;
Check.ID: 98  Tick.ID: 3  Tick.Result: 1 Tick.Date: Mar12
Check.ID: 99  Tick.ID: 4  Tick.Result: 4 Tick.Date: Apr12

Second query currently returns the most recent result where the result is =>3, like;
Check.ID: 98  Tick.ID: 2  Tick.Result: 5 Tick.Date: Feb12
Check.ID: 99  Tick.ID: 4  Tick.Result: 5 Tick.Date: Apr12

When I really want:
Check.ID: 99  Tick.ID: 4  Tick.Result: 5 Tick.Date: Apr12

(ID 98 doesn't show as the last Tick.Result is 1).

Could you try the following to see if it starts you in the right direction:

    scope :just_a_test, -> {
    includes(:ticks)
    .order("checks.id")
    .where("ticks.created_at = (SELECT MAX(ticks.created_at) FROM ticks WHERE ticks.check_id = checks.id)")
    .where("ticks.result >= 3")
    .group("checks.id")
    }

I'm not sure I really understand the point of the :mostrecenttickonly scope since you're just loading the checks.

That being said, if you want to get only those checks whose most recent ticks have a result greater than three, I think the best way to do that would be a window function :

check.rb

...
  scope :duebylastresult, -> {
    find_by_sql(
      'SELECT *
       FROM (SELECT checks.*,
                    ticks.id AS tick_ids,
                    ticks.date AS tick_date,
                    ticks.result AS tick_result,
                    dense_rank() OVER (
                      PARTITION BY checks.id
                      ORDER BY ticks.date DESC
                    ) AS tick_rank
             FROM checks
             LEFT OUTER JOIN ticks ON checks.id = ticks.check_id) AS ranked_ticks
       WHERE tick_rank = 1 AND tick_result >= 3;'
    )
  }
...

Basically, we're just joining everything in the checks and ticks tables, then adding another attribute called tick_rank that is ranking each row in the result set according to its date versus the other rows with the same checks.id value.

The way SQL works is that the predicates (the conditions in the WHERE clause) are evaluated prior to the evaluation of the SELECT fields, meaning we can't just write tick_rank = 1 in this statement.

So we have to go the extra step of wrapping the results (which we alias as ranked_ticks ) and then just select everything and apply the predicates we want to this outer select statement. The tick_rank has to be 1 , meaning it's the most recent tick , and the result has to be >= 3.


edit: I was using that article I linked as a refresher since I often forget SQL syntax, but after looking at it, I think this would be somewhat more performant (basically just wait to join checks until after the partitioning is done, that way I believe it will do fewer full scans):

  scope :duebylastresult, -> {
    find_by_sql(
      'SELECT *
       FROM checks
       LEFT OUTER JOIN 
            (SELECT id AS tick_id,
                    check_id AS check_id,
                    date AS tick_date, 
                    result AS tick_result,
                    dense_rank() OVER (
                      PARTITION BY ticks.check_id
                      ORDER BY ticks.date DESC
                    ) AS tick_rank
             FROM ticks) AS ranked_ticks ON checks.id = ranked_ticks.check_id
       WHERE tick_rank = 1 AND tick_result >= 3;'
    )
  }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM