繁体   English   中英

计算SQL中列的每个唯一组合的行

[英]Count rows for each unique combination of columns in SQL

我想从一个表格中返回一个基于两列的唯一记录,以及最近的发布时间以及这两个列的组合在(及时)记录之前出现的总次数。输出。

所以我想要得到的是以下几方面的东西:

select col1, col2, max_posted, count from T
join (
 select col1, col2, max(posted) as posted  from T where groupid = "XXX" 
group by col1, col2) h
on ( T.col1 = h.col1 and
  T.col2 = h.col2 and
  T.max_posted = h.tposted)
where T.groupid = 'XXX'

计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。 (我希望我能正确解释:)

编辑:尝试以下建议:

 select dx.*,
   count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as   cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted  from dx where groupid ="XXX" 
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
  dx.cdesc = h.cdesc and
  dx.tposted = h.tposted)
where groupid =  'XXX';

计数始终返回“ 1”。 此外,您将如何仅计算发过tposted之前发生的记录?

这也失败了,但是我希望你能达到我的目标:

  WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT  count(*) as cnt
    FROM dx, h
    WHERE dx.cicd9 = h.cicd9
      and dx.cdesc = h.cdesc
      and dx.tposted <= h.tposted
      and dx.groupid = 'XXX'
 )
SELECT H.*,J.cnt
FROM H,J 

帮助任何人?

这个怎么样:

SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc,
  max(posted) OVER w AS last_post,
  count(*) OVER w AS num_posts
FROM dx
WHERE groupid = 'XXX'
WINDOW w AS (
  PARTITION BY cicd9, cdesc
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

鉴于缺少PG版本,表定义,数据和所需的输出,这只是从臀部拍摄而已,但是原理应该起作用:在两列上进行分区,其中groupid = 'XXX' ,然后找到已posted的最大值列和窗口框架中的总行数(因此,窗口定义中的RANGE...子句)。

您是否只想累积计数?

select t.*,
       count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';

这是我能提出的最好的建议-欢迎提出更好的建议!

这将产生我需要的结果,但要理解,计数将始终至少为1(来自连接):

  SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid   =  'XXX' 
    group by cicd9, cdesc) h
on 
  (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;

要么

 WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc)  
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;

这令人困惑:

计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。

根据定义,由于每条记录都在最新帖子的“之前”(或与最新帖子同时),因此,这实际上意味着每个组合总计数 (忽略句子中假定的一个错误)。

因此,这可以归结为一个简单的GROUP BY

SELECT cicd9, cdesc
     , max(posted) AS last_posted
     , count(*)    AS ct
FROM   dx
WHERE  groupid = 'XXX'
GROUP  BY 1, 2
ORDER  BY 1, 2;

与目前接受的答案完全相同 只是更快,更简单。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM