简体   繁体   English

查询以获取时间戳值最接近的记录以获取两列的唯一组合

[英]Query to get records with closest timestamp values for unique combination of two columns

+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf100 | 11/15/2015 19:10 |
|   123 |                    1 | shelf1   | 11/28/2015 8:00  |
|   555 |                    4 | shelf5   | 11/28/2015 9:10  |
|   212 |                    2 | shelf2   | 11/29/2015 8:10  |
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   321 |                    8 | shelf7   | 11/30/2015 8:10  |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

Let's say I have a table (PostgreSQL) like the above called bookshelf_configuration . 假设我有一个像上面的表(PostgreSQL)叫做bookshelf_configuration If I'm given an ISBN and a timestamp, I want to be able to find the closest (before only) records for each unique combination of isbn and book_container_id . 如果给我一个ISBN和一个时间戳,我希望能够找到isbnbook_container_id每个唯一组合的最近(仅在之前)记录。

So if I'm looking at isbn '555', with a timestamp of '12/1/2015 7:00', I should get back: 因此,如果我正在查看isbn ',且时间戳为'12 / 1/2015 7:00',则应该返回:

+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

My knowledge of SQL is extremely basic. 我对SQL的了解非常基础。 I've got a query that would work if I only had to factor in isbn, but I need some help understanding how to do this for the combination (isbn, book_container_id) . 我有一个查询,如果我只需要考虑isbn,它将起作用,但是我需要一些帮助来理解如何对组合(isbn, book_container_id)

There is a thing called Row_Number that can help you here. 有一个叫做Row_Number的东西可以在这里帮助您。

Select * 
From (
    Select *,
           row_number() OVER (partition by isbn, book_container_id order by update_time desc) rn
    From   bookshelf_configuration
    Where  isbn = 555 and update_time <= '12/1/2015 7:00'   
) q 
Where q.rn = 1

A typical use case for DISTINCT ON : DISTINCT ON典型用例:

SELECT DISTINCT ON (book_container_id)
       isbn, book_container_id, shelf_id, update_time 
FROM   bookshelf_configuration
WHERE  isbn = 555
AND    update_time <= '2015-12-01 07:00'  -- ISO 8601 format
ORDER  BY book_container_id, update_time DESC;

Assuming update_time is defined NOT NULL , or you have to add NULLS LAST . 假设将update_time定义为NOT NULL ,或者您必须添加NULLS LAST Detailed explanation: 详细说明:

Depending on cardinalities and value frequencies there may be even faster query styles: 根据基数和值频率,查询样式可能会更快:

Either way, a multicolumn index on (isbn, book_container_id, update_time DESC) is the key to make this fast for tables of non-trivial size. 无论哪种方式, (isbn, book_container_id, update_time DESC)上的多列索引都是快速实现非平凡表的关键。 Sort order should match the query (or be it's complete inversion). 排序顺序应与查询匹配(或者是完全倒置)。 If you add NULLS LAST to the query, add it to the index as well. 如果将NULLS LAST添加到查询中,则也将其添加到索引中。

Aside: It's better to use ISO 8601 format for all date / time constants, since that is unambiguous with any locale or datestyle setting. 另外:最好对所有日期/时间常数使用ISO 8601格式,因为这与任何语言环境或日期样式设置都没有歧义。 Related: 有关:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM