Window 函数对不同的记录进行计数

Question

下面的查询基于一个复杂的视图，该视图按我的意愿工作（我不打算包含该视图，因为我认为它不会帮助解决手头的问题）。 我不能正确的是drugCountsinFamilies列。 我需要它来向我显示每个药物系列的distinct drugName的数量。 您可以从第一个屏幕截图中看到有三个不同的 H3A 行。 H3A 的drugCountsInFamilies应该是 3（有三种不同的 H3A 药物。）

在此处输入图像描述

您可以从第二个屏幕截图中看到，第一个屏幕截图中的drugCountsInFamilies正在捕获列出药物名称的行数。
在此处输入图像描述

以下是我的问题，对不正确的部分进行了评论

select distinct
     rx.patid
    ,d2.fillDate
    ,d2.scriptEndDate
    ,rx.drugName
    ,rx.drugClass
    --the line directly below is the one that I can't figure out why it's wrong
    ,COUNT(rx.drugClass) over(partition by rx.patid,rx.drugclass,rx.drugname) as drugCountsInFamilies
from 
(
select 
    ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn
    ,d.patid
    ,d.fillDate
    ,d.scriptEndDate
    ,d.uniqueDrugsInTimeFrame
    from DrugsPerTimeFrame as d
)d2
inner join rx on rx.patid = d2.patid
inner join DrugTable as dt on dt.drugClass=rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate
and dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
order by rx.patid

如果我尝试在count(rx.drugClass)子句中添加 distinct，SSMS 会发疯。 可以使用 window 函数来完成吗？

Answer 1

我遇到了这个问题，寻找解决我计算不同值的问题的方法。 在寻找答案时，我发现了这篇文章。 见最后评论。 我测试了它并使用了SQL。 它对我来说真的很好，我想我会在这里提供另一个解决方案。

总之，使用DENSE_RANK() ， PARTITION BY分组列， ORDER BY ASC和DESC对列进行计数：

DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName ASC) +
DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName DESC) - 1 AS drugCountsInFamilies

我用这个作为自己的模板。

DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC ) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount

我希望这有帮助！

Answer 2

将count(distinct)作为Windows函数需要一个技巧。 实际上有几个级别的技巧。

因为您的请求实际上非常简单 - 值始终为1，因为rx.drugClass位于分区子句中 - 我将做出假设。 假设您想要计算每个独特药物类别的数量。

如果是这样，请执行由patid和drugClass分区的row_number() 。 当这是1，在一个patid，然后一个新的drugClass开始。 创建一个在这种情况下为1的标志，在所有其他情况下为0。

然后，您可以简单地使用分区子句进行sum以获取不同值的数量。

查询（格式化之后我可以阅读它），如下所示：

select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
       SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
             (case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
                   then 1 else 0
              end) as IsFirstRowInGroup
      from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn, 
                   d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
            from DrugsPerTimeFrame as d
           ) d2 inner join
           rx
           on rx.patid = d2.patid inner join
           DrugTable dt
           on dt.drugClass = rx.drugClass
      where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
            dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
     ) t
order by patid

Answer 3

我认为您尝试做的是将其作为窗口函数：

COUNT(DISTINCT rx.drugName) over(partition by rx.patid,rx.drugclass) as drugCountsInFamilies

哪个 SQL 抱怨。 但是你可以这样做：

SELECT 
rx.patid
, rx.drugName
, rx.drugClass
, (SELECT COUNT(DISTINCT rx2.drugName) FROM rx rx2 WHERE rx2.drugClass = rx.DrugClass AND rx2.patid = rx.patid) As drugCountsInFamilies
FROM rx
...

如果表很大，那么最好将索引放在其中一列（例如 patid）上，这样嵌套查询就不会消耗大量资源。

Answer 4

select max(dense_rank() over (order by name desc partition by family)) over (partition by family)

这能行吗？

Answer 5

为什么这样的事情不起作用？

SELECT 
   IDCol_1
  ,IDCol_2
  ,Count(*) Over(Partition By IDCol_1, IDCol_2 order by IDCol_1) as numDistinct
FROM Table_1

Window 函数对不同的记录进行计数

问题描述

5 个解决方案

解决方案1
18 2017-06-07 18:32:32

解决方案2
17 已采纳 2012-11-20 20:19:23

解决方案3
1 2021-09-07 08:01:05

解决方案4
0 2022-08-11 12:40:32

解决方案5
-2 2016-08-18 19:05:14

Window 函数对不同的记录进行计数

问题描述

5 个解决方案

解决方案1 18 2017-06-07 18:32:32

解决方案2 17 已采纳 2012-11-20 20:19:23

解决方案3 1 2021-09-07 08:01:05

解决方案4 0 2022-08-11 12:40:32

解决方案5 -2 2016-08-18 19:05:14

解决方案1
18 2017-06-07 18:32:32

解决方案2
17 已采纳 2012-11-20 20:19:23

解决方案3
1 2021-09-07 08:01:05

解决方案4
0 2022-08-11 12:40:32

解决方案5
-2 2016-08-18 19:05:14