将多行数据聚合为单行

Question

In my table each row has some data columns Priority column (for example, timestamp or just an integer). 在我的表中，每一行都有一些数据列Priority列（例如，时间戳或只是一个整数）。 I want to group my data by ID and then in each group take latest not-null column. 我想按ID对数据进行分组，然后在每个组中采用最新的非空列。 For example I have following table: 例如，我有以下表格：

id  A       B       C       Priority
1   NULL    3       4       1
1   5       6       NULL    2
1   8       NULL    NULL    3
2   634     346     359     1
2   34      NULL    734     2

Desired result is : 期望的结果是：

id  A   B   C   
1   8   6   4   
2   34  346 734

In this example table is small and has only 5 columns, but in real table it will be much larger. 在这个示例中，表很小并且只有5列，但在实际表中它会更大。 I really want this script to work fast. 我真的希望这个脚本能够快速运行。 I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable. 我尝试自己做，但我的脚本适用于SQLSERVER2012 +所以我删除它不适用。

Numbers: table could have 150k of rows, 20 columns, 20-80k of unique id s and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5 数字：表可以有150k行，20列，20-80k的唯一id和平均SELECT COUNT(id) FROM T GROUP BY ID是2..5

Now I have a working code (thanks to @ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on). 现在我有一个工作代码（感谢@ypercubeᵀᴹ），但它在大表上运行速度非常慢，在我的情况下脚本可能需要一分钟甚至更长时间（带索引等）。

How can it be speeded up? 它怎么能加速？

SELECT 
    d.id,
    d1.A,
    d2.B,
    d3.C
FROM 
    ( SELECT id
      FROM T
      GROUP BY id
    ) AS d
  OUTER APPLY
    ( SELECT TOP (1) A
      FROM T 
      WHERE id = d.id
        AND A IS NOT NULL
      ORDER BY priority DESC
    ) AS d1 
  OUTER APPLY
    ( SELECT TOP (1) B
      FROM T 
      WHERE id = d.id
        AND B IS NOT NULL
      ORDER BY priority DESC
    ) AS d2 
  OUTER APPLY
    ( SELECT TOP (1) C
      FROM T 
      WHERE id = d.id
        AND C IS NOT NULL
      ORDER BY priority DESC
    ) AS d3 ;

In my test database with real amount of data I get following execution plan: 在具有实际数据量的测试数据库中，我得到了以下执行计划：

Answer 1

This should do the trick, everything raised to the power 0 will return 1 except null: 这应该做的伎俩，所有提升到幂0的东西将返回1除了null：

DECLARE @t table(id int,A int,B  int,C int,Priority int)
INSERT @t
VALUES (1,NULL,3   ,4   ,1),
(1,5   ,6   ,NULL,2),(1,8   ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34  ,NULL,734 ,2)

;WITH CTE as
(
  SELECT id, 
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
  FROM @t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id

Result: 结果：

id  a   b   c
1   8   6   4
2   34  346 734

Answer 2

One alternative that might be faster is a multiple join approach. 一种可能更快的替代方案是多连接方法。 Get the priority for each column and then join back to the original table. 获取每列的优先级，然后返回原始表。 For the first part: 第一部分：

select id,
       max(case when a is not null then priority end) as pa,
       max(case when b is not null then priority end) as pb,
       max(case when c is not null then priority end) as pc
from t
group by id;

Then join back to this table: 然后加入这个表：

with pabc as (
      select id,
             max(case when a is not null then priority end) as pa,
             max(case when b is not null then priority end) as pb,
             max(case when c is not null then priority end) as pc
      from t
      group by id
     )
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
     t ta
     on pabc.id = ta.id and pabc.pa = ta.priority left join
     t tb
     on pabc.id = tb.id and pabc.pb = tb.priority left join
     t tc
     on pabc.id = tc.id and pabc.pc = tc.priority ;

This can also take advantage of an index on t(id, priority) . 这也可以利用t(id, priority)的索引。

Answer 3

previous code will work with following syntax: 以前的代码将使用以下语法：

 with pabc as (
          select id,
                 max(case when a is not null then priority end) as pa,
                 max(case when b is not null then priority end) as pb,
                 max(case when c is not null then priority end) as pc
          from t
          group by id
         )
    select pabc.Id,ta.a, tb.b, tc.c
    from pabc 
         left join t ta on pabc.id = ta.id and  pabc.pa = ta.priority 
         left join t tb on pabc.id = tb.id and pabc.pb = tb.priority 
         left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;

Answer 4

This looks rather strange. 这看起来很奇怪。 You have a log table for all column changes, but no associated table with current data. 您有一个用于所有列更改的日志表，但没有与当前数据关联的表。 Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally. 现在，您正在寻找从日志表中收集当前值的查询，这自然是一项艰巨的任务。

The solution is simple: have an additional table with the current data. 解决方案很简单：有一个包含当前数据的附加表。 You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry). 您甚至可以使用触发器链接表（因此，每次在日志表中插入记录时，您都会更新当前表，或者每次将更改写入当前表时都会编写日志条目）。

Then just query your current table: 然后只查询当前表：

select id, a, b, c from currenttable order by id;

将多行数据聚合为单行

问题描述

4 个解决方案

解决方案1
4 已采纳 2016-02-04 13:29:01

解决方案2
2 2016-02-04 13:07:46

解决方案3
0 2016-02-04 13:33:50

解决方案4
-1 2016-02-04 13:23:53

将多行数据聚合为单行

问题描述

4 个解决方案

解决方案1 4 已采纳 2016-02-04 13:29:01

解决方案2 2 2016-02-04 13:07:46

解决方案3 0 2016-02-04 13:33:50

解决方案4 -1 2016-02-04 13:23:53

解决方案1
4 已采纳 2016-02-04 13:29:01

解决方案2
2 2016-02-04 13:07:46

解决方案3
0 2016-02-04 13:33:50

解决方案4
-1 2016-02-04 13:23:53