[英]Count rows for each unique combination of columns in SQL
我想从一个表格中返回一个基于两列的唯一记录,以及最近的发布时间以及这两个列的组合在(及时)记录之前出现的总次数。输出。
所以我想要得到的是以下几方面的东西:
select col1, col2, max_posted, count from T
join (
select col1, col2, max(posted) as posted from T where groupid = "XXX"
group by col1, col2) h
on ( T.col1 = h.col1 and
T.col2 = h.col2 and
T.max_posted = h.tposted)
where T.groupid = 'XXX'
计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。 (我希望我能正确解释:)
编辑:尝试以下建议:
select dx.*,
count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted from dx where groupid ="XXX"
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
dx.cdesc = h.cdesc and
dx.tposted = h.tposted)
where groupid = 'XXX';
计数始终返回“ 1”。 此外,您将如何仅计算发过tposted
之前发生的记录?
这也失败了,但是我希望你能达到我的目标:
WITH H AS (
SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX'
group by cicd9, cdesc),
J AS (
SELECT count(*) as cnt
FROM dx, h
WHERE dx.cicd9 = h.cicd9
and dx.cdesc = h.cdesc
and dx.tposted <= h.tposted
and dx.groupid = 'XXX'
)
SELECT H.*,J.cnt
FROM H,J
帮助任何人?
这个怎么样:
SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc,
max(posted) OVER w AS last_post,
count(*) OVER w AS num_posts
FROM dx
WHERE groupid = 'XXX'
WINDOW w AS (
PARTITION BY cicd9, cdesc
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
鉴于缺少PG版本,表定义,数据和所需的输出,这只是从臀部拍摄而已,但是原理应该起作用:在两列上进行分区,其中groupid = 'XXX'
,然后找到已posted
的最大值列和窗口框架中的总行数(因此,窗口定义中的RANGE...
子句)。
您是否只想累积计数?
select t.*,
count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';
这是我能提出的最好的建议-欢迎提出更好的建议!
这将产生我需要的结果,但要理解,计数将始终至少为1(来自连接):
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx
join (
SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX'
group by cicd9, cdesc) h
on
(dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted
and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;
要么
WITH H AS (
SELECT cicd9, cdesc, max(tposted) as tposted from dx where groupid = 'XXX'
group by cicd9, cdesc)
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted
and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;
这令人困惑:
计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。
根据定义,由于每条记录都在最新帖子的“之前”(或与最新帖子同时),因此,这实际上意味着每个组合的总计数 (忽略句子中假定的一个错误)。
因此,这可以归结为一个简单的GROUP BY
:
SELECT cicd9, cdesc
, max(posted) AS last_posted
, count(*) AS ct
FROM dx
WHERE groupid = 'XXX'
GROUP BY 1, 2
ORDER BY 1, 2;
与目前接受的答案完全相同 。 只是更快,更简单。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.