[英]PostgreSQL - Getting count of latest and specific records
我有一张桌子:
select sid, type, status, timestamp from contact_history limit 10;
sid | type | status | timestamp
---------+------+--------+-------------------------------
6291179 | 0 | 1025 | 2015-08-24 13:05:22.501025+02
68737 | 0 | 5 | 2015-08-24 13:05:32.500005+02
4987391 | 0 | 65 | 2015-08-24 13:05:35.500065+02
1189551 | 1 | 65 | 2015-08-24 13:06:05.510065+02
3374714 | 1 | 5 | 2015-07-27 13:25:25.510005+02
2297221 | 0 | 5 | 2015-07-27 13:25:48.500005+02
5503230 | 2 | 65 | 2015-07-27 13:25:50.520065+02
596992 | 1 | 65 | 2015-07-27 13:26:51.510065+02
5215455 | 0 | 1025 | 2015-07-27 13:27:21.501025+02
3011248 | 0 | 5 | 2015-07-27 13:27:46.500005+02
(10 rows)
\d contact_history
Table "contact_history"
Column | Type | Modifiers
---------------+--------------------------+----------------------------------------------------------
sid | character varying(32) | not null
type | integer | not null
status | integer | not null
timestamp | timestamp with time zone | not null
id | bigint | not null default nextval('contact_history_id_seq'::regcla
Indexes:
"contact_history_pk" PRIMARY KEY, btree (id)
"contact_history_sid_timestamp_idx" btree (sid, "timestamp")
当每个sid
在指定的timestamp
达到某种type
和status
时记录。 没有uniq行。 每个sid
都可以随时获取随机type
和status
。 有两千万行。 PostgreSQL版本是9.3.13
现在我想知道刚才有多少个sid
(type='0' or type='1') and status='5'
> max(timestamp)
。 换句话说,对于每个sid
找到最后一个timestamp
以及相应的type
和status
,然后计算满足条件(type='0' or type='1') and status='5'
timestamp
。 所以我期望一个数字作为输出。 欢迎使用其他具有相同结果的更有效方法。 谢谢。
多亏了a_horse_with_no_name,我遵循了每组最大的n条路径。 不幸的是,这有点不同。 我做了一些猴子设计,到目前为止,我收到了以下查询,并提出了不同的费用:
EXPLAIN SELECT count(*) FROM contact_history t1 LEFT OUTER JOIN contact_history t2 ON (t1.sid = t2.sid AND t1.timestamp < t2.timestamp) WHERE t2.sid IS NULL and (t1.type=0 OR t1.type=1) and t1.status = '5';
QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=158816.96..158816.97 rows=1 width=0)
-> Hash Anti Join (cost=66228.91..158003.37 rows=325435 width=0)
Hash Cond: ((t1.sid)::text = (t2.sid)::text)
Join Filter: (t1."timestamp" < t2."timestamp")
-> Seq Scan on contact_history t1 (cost=0.00..50771.93 rows=488152 width=15)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
-> Hash (cost=39041.96..39041.96 rows=1563996 width=15)
-> Seq Scan on contact_history t2 (cost=0.00..39041.96 rows=1563996 width=15)
(8 rows)
EXPLAIN SELECT count(*) from contact_history as ch, (select sid, max(timestamp) as max_t from contact_history group by sid) as sub where ch.sid=sub.sid and ch.timestamp=sub.max_t and (type='0' or type='1') and status = '5';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Aggregate (cost=393277.11..393277.12 rows=1 width=0)
-> Merge Join (cost=366994.07..393277.10 rows=2 width=0)
Merge Cond: ((contact_history.sid)::text = (ch.sid)::text)
Join Filter: (ch."timestamp" = (max(contact_history."timestamp")))
-> GroupAggregate (cost=253411.17..267270.04 rows=212890 width=15)
-> Sort (cost=253411.17..257321.16 rows=1563996 width=15)
Sort Key: contact_history.sid
-> Seq Scan on contact_history (cost=0.00..39041.96 rows=1563996 width=15)
-> Materialize (cost=113582.90..116023.66 rows=488152 width=15)
-> Sort (cost=113582.90..114803.28 rows=488152 width=15)
Sort Key: ch.sid
-> Seq Scan on contact_history ch (cost=0.00..50771.93 rows=488152 width=15)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)))
(13 rows)
EXPLAIN SELECT count(*) FROM contact_history as ch1 WHERE timestamp = (SELECT MAX(timestamp) FROM contact_history AS ch2 WHERE ch1.sid = ch2.sid) and (ch1.type='0' or ch1.type='1') and ch1.status = '5';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
---------------------------------------------------
Aggregate (cost=7919844.02..7919844.03 rows=1 width=0)
-> Seq Scan on contact_history ch1 (cost=0.00..7919837.92 rows=2441 width=0)
Filter: ((status = 5) AND ((type = 0) OR (type = 1)) AND ("timestamp" = (SubPlan 2)))
SubPlan 2
-> Result (cost=5.02..5.03 rows=1 width=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.43..5.02 rows=1 width=8)
-> Index Only Scan Backward using contact_history_sid_timestamp_idx on cont
act_history ch2 (cost=0.43..32.57 rows=7 width=8)
Index Cond: ((sid = (ch1.sid)::text) AND ("timestamp" IS NOT NULL))
(9 rows)
一些改进,补充,评论或解释不胜欢迎。 谢谢。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.