简体   繁体   English

计算每个组合的出现次数

[英]Count occurrences of each combination

I have 5 conditions (AE) and a bunch of patient ID's.我有 5 个条件 (AE) 和一堆患者 ID。 My data set is 2 columns: PatientID, Condition.我的数据集是 2 列:PatientID、Condition。

There are duplicate PatientID's with every new condition:每个新条件都有重复的 PatientID:

PatientID患者 ID Condition健康)状况
456 456 C C
456 456 E
279 279 D D
123 123 A一个
123 123 C C
123 123 D D
187 187 D D
296 296 E
296 296 C C

I believe there are 31 different potential combinations (order doesn't matter) of those 5 conditions (ie A, AB, ABC, AC, ACDE, etc)我相信这 5 个条件(即 A、AB、ABC、AC、ACDE 等)有 31 种不同的潜在组合(顺序无关紧要)

I want to count how many patients in each combination of conditions.我想计算每种情况组合中有多少患者。 So my results for the above would be - CE: 2 D: 2 ACD: 1所以我的上述结果将是 - CE:2 D:2 ACD:1

I'm more familiar with Excel but if this is better handled in SQL, I can do it there.我更熟悉 Excel 但如果在 SQL 中处理得更好,我可以在那里做。 I think I need to create a table of all the different combinations (any help on that would be appreciated too) and then do a count from there but I'm not sure if that's the best way.我想我需要创建一个包含所有不同组合的表格(对此也有任何帮助),然后从那里进行计数,但我不确定这是否是最好的方法。

This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)这也可以使用 Power Query 来完成,在 Windows Excel 2010+ 和 Excel 365(Windows 或 Mac)中可用

To use Power Query使用 Power Query

  • Select some cell in your Data Table Select 数据表中的某个单元格
  • Data => Get&Transform => from Table/Range or from within sheet Data => Get&Transform => from Table/Rangefrom within sheet
  • When the PQ Editor opens: Home => Advanced Editor当 PQ 编辑器打开时: Home => Advanced Editor
  • Make note of the Table Name in Line 2记下第 2 行中的表
  • Paste the M Code below in place of what you see粘贴下面的 M 代码代替您看到的内容
  • Change the Table name in line 2 back to what was generated originally.将第 2 行中的表名称更改回最初生成的名称。
  • Read the comments and explore the Applied Steps to understand the algorithm阅读评论并探索Applied Steps以了解算法
let

//Change next line to reflect your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"PatientID", Int64.Type}, {"Condition", type text}}),

//Group by ID and aggregate a sorted list of each ID's conditions
    #"Grouped Rows" = Table.Group(#"Changed Type", {"PatientID"}, {
        {"Conditions", each Text.Combine(List.Sort([Condition])), type text}
        }),

//Group by Conditions and aggregate with Count
    #"Grouped Rows1" = Table.Group(#"Grouped Rows", {"Conditions"}, {{"Condition Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows1"

在此处输入图像描述

SQL Server solution SQL 服务器解决方案

SELECT
    conditions,
    COUNT(*)
FROM (
    SELECT
        patientid,
        STRING_AGG(condition, '') WITHIN GROUP (ORDER BY condition) conditions
    FROM tbl
    GROUP BY patientid
) c
GROUP BY conditions 

Output Output

conditions条件 patient_count患者计数
ACD ACD 1 1
CE行政长官 2 2
D D 2 2

db<>fiddle here db<> 在这里摆弄

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM