简体   繁体   English

计算SQL中列的每个唯一组合的行

[英]Count rows for each unique combination of columns in SQL

I would like to return a set of unique records from a table based on two columns along with the most recent posting time and a total count of the number of times the combination of those two columns has appeared before (in time) the record of their output. 我想从一个表格中返回一个基于两列的唯一记录,以及最近的发布时间以及这两个列的组合在(及时)记录之前出现的总次数。输出。

So what I'm trying to get is something along these lines: 所以我想要得到的是以下几方面的东西:

select col1, col2, max_posted, count from T
join (
 select col1, col2, max(posted) as posted  from T where groupid = "XXX" 
group by col1, col2) h
on ( T.col1 = h.col1 and
  T.col2 = h.col2 and
  T.max_posted = h.tposted)
where T.groupid = 'XXX'

Count needs to be the number of times EACH combination of col1 and col2 occurred BEFORE the max_posted of each record in the output. 计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。 (I hope I explained that correctly :) (我希望我能正确解释:)

Edit: In trying the below suggestion as: 编辑:尝试以下建议:

 select dx.*,
   count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as   cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted  from dx where groupid ="XXX" 
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
  dx.cdesc = h.cdesc and
  dx.tposted = h.tposted)
where groupid =  'XXX';

The count always returns '1'. 计数始终返回“ 1”。 Additionally, how would you count only the records that occurred before tposted ? 此外,您将如何仅计算发过tposted之前发生的记录?

This also fails, but I hope you can get where I'm headed: 这也失败了,但是我希望你能达到我的目标:

  WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT  count(*) as cnt
    FROM dx, h
    WHERE dx.cicd9 = h.cicd9
      and dx.cdesc = h.cdesc
      and dx.tposted <= h.tposted
      and dx.groupid = 'XXX'
 )
SELECT H.*,J.cnt
FROM H,J 

Help anyone? 帮助任何人?

How about this: 这个怎么样:

SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc,
  max(posted) OVER w AS last_post,
  count(*) OVER w AS num_posts
FROM dx
WHERE groupid = 'XXX'
WINDOW w AS (
  PARTITION BY cicd9, cdesc
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Given the lack of PG version, table definition, data and desired output this is just shooting from the hip, but the principle should work: Make a partition on the two columns where groupid = 'XXX' , then find the maximum value of the posted column and the total number of rows in the window frame (hence the RANGE... clause in the window definition). 鉴于缺少PG版本,表定义,数据和所需的输出,这只是从臀部拍摄而已,但是原理应该起作用:在两列上进行分区,其中groupid = 'XXX' ,然后找到已posted的最大值列和窗口框架中的总行数(因此,窗口定义中的RANGE...子句)。

Do you just want a cumulative count? 您是否只想累积计数?

select t.*,
       count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';

This was the best I could come up with -- better suggestions are welcome! 这是我能提出的最好的建议-欢迎提出更好的建议!

This will produce the results I need, with the understanding that count will always be at least 1 (from the join): 这将产生我需要的结果,但要理解,计数将始终至少为1(来自连接):

  SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid   =  'XXX' 
    group by cicd9, cdesc) h
on 
  (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;

or 要么

 WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc)  
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;

This was confusing: 这令人困惑:

Count needs to be the number of times EACH combination of col1 and col2 occurred BEFORE the max_posted of each record in the output. 计数需要为输出中每个记录的max_posted之前col1和col2的每种组合发生的次数。

Since, by definition, every record is "before" (or at the same time as) the latest post, this essentially means the total count per combination (ignoring the assumed off-by-one error in the sentence). 根据定义,由于每条记录都在最新帖子的“之前”(或与最新帖子同时),因此,这实际上意味着每个组合总计数 (忽略句子中假定的一个错误)。

So this burns down to a simple GROUP BY : 因此,这可以归结为一个简单的GROUP BY

SELECT cicd9, cdesc
     , max(posted) AS last_posted
     , count(*)    AS ct
FROM   dx
WHERE  groupid = 'XXX'
GROUP  BY 1, 2
ORDER  BY 1, 2;

Which does exactly the same as the currently accepted answer. 与目前接受的答案完全相同 Just a lot faster and simpler. 只是更快,更简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM