PostgreSQL-获取最新和特定记录的计数

Question

我有一张桌子：

    select sid, type, status, timestamp from contact_history limit 10;
           sid   | type | status |           timestamp
        ---------+------+--------+-------------------------------
         6291179 |    0 |   1025 | 2015-08-24 13:05:22.501025+02
         68737   |    0 |      5 | 2015-08-24 13:05:32.500005+02
         4987391 |    0 |     65 | 2015-08-24 13:05:35.500065+02
         1189551 |    1 |     65 | 2015-08-24 13:06:05.510065+02
         3374714 |    1 |      5 | 2015-07-27 13:25:25.510005+02
         2297221 |    0 |      5 | 2015-07-27 13:25:48.500005+02
         5503230 |    2 |     65 | 2015-07-27 13:25:50.520065+02
         596992  |    1 |     65 | 2015-07-27 13:26:51.510065+02
         5215455 |    0 |   1025 | 2015-07-27 13:27:21.501025+02
         3011248 |    0 |      5 | 2015-07-27 13:27:46.500005+02
        (10 rows)


\d contact_history
                                      Table "contact_history"
        Column     |           Type           |                          Modifiers
    ---------------+--------------------------+----------------------------------------------------------
     sid           | character varying(32)    | not null
     type          | integer                  | not null
     status        | integer                  | not null
     timestamp     | timestamp with time zone | not null
     id            | bigint                   | not null default nextval('contact_history_id_seq'::regcla
    Indexes:
        "contact_history_pk" PRIMARY KEY, btree (id)
        "contact_history_sid_timestamp_idx" btree (sid, "timestamp")

当每个sid在指定的timestamp达到某种type和status时记录。 没有uniq行。 每个sid都可以随时获取随机type和status 。 有两千万行。 PostgreSQL版本是9.3.13

现在我想知道刚才有多少个sid (type='0' or type='1') and status='5' > max(timestamp) 。 换句话说，对于每个sid找到最后一个timestamp以及相应的type和status ，然后计算满足条件(type='0' or type='1') and status='5' timestamp 。 所以我期望一个数字作为输出。 欢迎使用其他具有相同结果的更有效方法。 谢谢。

Answer 1

多亏了a_horse_with_no_name，我遵循了每组最大的n条路径。 不幸的是，这有点不同。 我做了一些猴子设计，到目前为止，我收到了以下查询，并提出了不同的费用：

EXPLAIN SELECT count(*) FROM contact_history t1 LEFT OUTER JOIN contact_history t2 ON (t1.sid = t2.sid AND t1.timestamp < t2.timestamp) WHERE t2.sid IS NULL and (t1.type=0 OR t1.type=1) and t1.status = '5';
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=158816.96..158816.97 rows=1 width=0)
   ->  Hash Anti Join  (cost=66228.91..158003.37 rows=325435 width=0)
         Hash Cond: ((t1.sid)::text = (t2.sid)::text)
         Join Filter: (t1."timestamp" < t2."timestamp")
         ->  Seq Scan on contact_history t1  (cost=0.00..50771.93 rows=488152 width=15)
               Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
         ->  Hash  (cost=39041.96..39041.96 rows=1563996 width=15)
               ->  Seq Scan on contact_history t2  (cost=0.00..39041.96 rows=1563996 width=15)
(8 rows)

EXPLAIN SELECT count(*) from contact_history as ch, (select sid, max(timestamp) as max_t from contact_history group by sid) as sub where ch.sid=sub.sid and ch.timestamp=sub.max_t and (type='0' or type='1') and status = '5';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Aggregate  (cost=393277.11..393277.12 rows=1 width=0)
   ->  Merge Join  (cost=366994.07..393277.10 rows=2 width=0)
         Merge Cond: ((contact_history.sid)::text = (ch.sid)::text)
         Join Filter: (ch."timestamp" = (max(contact_history."timestamp")))
         ->  GroupAggregate  (cost=253411.17..267270.04 rows=212890 width=15)
               ->  Sort  (cost=253411.17..257321.16 rows=1563996 width=15)
                     Sort Key: contact_history.sid
                     ->  Seq Scan on contact_history  (cost=0.00..39041.96 rows=1563996 width=15)
         ->  Materialize  (cost=113582.90..116023.66 rows=488152 width=15)
               ->  Sort  (cost=113582.90..114803.28 rows=488152 width=15)
                     Sort Key: ch.sid
                     ->  Seq Scan on contact_history ch  (cost=0.00..50771.93 rows=488152 width=15)
                           Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
(13 rows)

EXPLAIN SELECT count(*) FROM contact_history as ch1 WHERE timestamp = (SELECT MAX(timestamp) FROM contact_history AS ch2 WHERE ch1.sid = ch2.sid) and (ch1.type='0' or ch1.type='1') and ch1.status = '5';
                                                                       QUERY PLAN

-----------------------------------------------------------------------------------------------------
---------------------------------------------------
 Aggregate  (cost=7919844.02..7919844.03 rows=1 width=0)
   ->  Seq Scan on contact_history ch1  (cost=0.00..7919837.92 rows=2441 width=0)
         Filter: ((status = 5) AND ((type = 0) OR (type = 1)) AND ("timestamp" = (SubPlan 2)))
         SubPlan 2
           ->  Result  (cost=5.02..5.03 rows=1 width=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.43..5.02 rows=1 width=8)
                         ->  Index Only Scan Backward using contact_history_sid_timestamp_idx on cont
act_history ch2  (cost=0.43..32.57 rows=7 width=8)
                               Index Cond: ((sid = (ch1.sid)::text) AND ("timestamp" IS NOT NULL))
(9 rows)

一些改进，补充，评论或解释不胜欢迎。 谢谢。

PostgreSQL-获取最新和特定记录的计数

问题描述

1 个解决方案

解决方案1
0 2016-08-22 15:02:20

PostgreSQL-获取最新和特定记录的计数

问题描述

1 个解决方案

解决方案1 0 2016-08-22 15:02:20

解决方案1
0 2016-08-22 15:02:20