简体   繁体   中英

Limit query result based on field value

I have a table account with the fallowing structure:

| agg_type  | agg_id  | sequence | payload | is_snapshot | timestamp |
| "account" | "agg_1" | 1        | "..."   | false       | ...       |
| "account" | "agg_1" | 2        | "..."   | true        | ...       |
| "account" | "agg_1" | 3        | "..."   | false       | ...       |
| "account" | "agg_1" | 4        | "..."   | false       | ...       |
| "account" | "agg_1" | 5        | "..."   | false       | ...       |
| "account" | "agg_1" | 6        | "..."   | false       | ...       |
| "account" | "agg_1" | 7        | "..."   | true        | ...       |
| "account" | "agg_1" | 8        | "..."   | false       | ...       |

I need to write a query that will retrieve all rows from this table from the latest snapshot onward of an specific aggregate. For instance, in the case of this table the query would return the last two rows (sequences 7 and 8).

I think that the query would go something like

SELECT * FROM account 
WHERE
  agg_type='account'
  AND agg_id='agg_1'
ORDER BY sequence ASC
LIMIT (???);

It's the (???) part that I'm not quite sure on how to implement.

Obs:

  • I'm using Postgres if it is of any help.
  • The (agg_type, agg_id, sequence) combination is a primary key.

Simplistically we can just retrieve all accounts where the sequence is greater than or equal to the highest sequence id that is a snapshot

SELECT * FROM account a
WHERE
  a.agg_type='account'
  AND a.agg_id='agg_1' 
  AND a.sequence >= 
    (SELECT MAX(sequence) FROM account b WHERE a.agg_type = b.agg_type AND a.agg_id = b. agg_id AND b.is_snapshot = true)

If you wanted to do them all it might be clearer to write it as a join:

SELECT a.* 
FROM 
  account a
  INNER JOIN
  (
    SELECT 
      agg_type, 
      agg_id, 
      MAX(sequence) as maxseq 
    FROM account b 
    GROUP BY agg_type, add_id
  ) maxes
  ON 
    a.agg_type = maxes.agg_type and
    maxes.agg_id = a.max_id and
    a.sequence >= maxes.maxseq

That's not to say we couldn't do either task with either form (and internally postgres will probably execute them the same anyway), but I've always felt that using a join as a restriction of "here are 10000 rows, and I want only the 2000 rows that meet a criteria laid down by these 1000 rows" is most clearly thought of in terms of blocks of data that are joined together

WITH a AS ( SELECT *,row_number() over(partition BY a.agg_type,a.agg_id ORDER BY a."SEQUENCE" DESC) rnk FROM account a ) SELECT * FROM a WHERE a.rnk <= 2;

A window function can pull this for all (agg_type, agg_id) combinations with only one sort:

with mark as (
  select *, 
         bool_or(is_snapshot) over w as trail_true
    from account
  window w as (partition by agg_type, agg_id 
                   order by sequence
            rows between 1 following
                     and unbounded following)
)
select *
  from mark
 where not coalesce(trail_true, false)
 order by agg_type, agg_id, sequence

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM