简体   繁体   English

聚合 SQL 中的连续行

[英]Aggregating consecutive rows in SQL

Given the sql table (I'm using SQLite3):鉴于 sql 表(我使用的是 SQLite3):

CREATE TABLE person(name text, number integer);

And filling with the values:并填充值:

insert into person values 
('Leandro', 2),
('Leandro', 4),
('Maria',   8),
('Maria',   16),
('Jose',    32),
('Leandro', 64);

What I want is to get the sum of the number column, but only for consecutive rows, so that I can the result, that maintain the original insertion order:我想要的是获取number列的总和,但仅限于连续行,以便我可以得到保持原始插入顺序的结果:

Leandro|6
Maria|24
Jose|32
Leandro|64

The "closest" I got so far is:到目前为止我得到的“最接近”是:

select name, sum(number) over(partition by name) from person order by rowid;

But it clearly shows I'm far from understanding SQL, as the most important features (grouping and summation of consecutive rows) is missing, but at least the order is there :-):但它清楚地表明我对 SQL 的理解还很远,因为缺少最重要的功能(连续行的分组和求和),但至少顺序是:-):

Leandro|70
Leandro|70
Maria|24
Maria|24
Jose|32
Leandro|70

Preferably the answer should not require creation of temporary tables, as the output is expected to always have the same order of how the data was inserted.最好答案不应该要求创建临时表,因为预计输出的顺序总是与数据插入的顺序相同。

You can do it with window functions:您可以使用窗口函数来做到这一点:

  • LAG() to check if the previous name is the same as the current one LAG() 检查前一个名称是否与当前名称相同
  • SUM() to create groups for consecutive same names SUM() 为连续的同名创建组

and then group by the groups and aggregate:然后按组分组并聚合:

select name, sum(number) total
from (
  select *, sum(flag) over (order by rowid) grp
  from (
    select *, rowid, name <> lag(name, 1, '') over (order by rowid) flag
    from person 
  )
)
group by grp

See the demo .请参阅演示
Results:结果:

> name    | total
> :------ | ----:
> Leandro |     6
> Maria   |    24
> Jose    |    32
> Leandro |    64

This is a type of gaps-and-islands problem.这是一种间隙和岛屿问题。 You can use the difference of row numbers for this purpose:为此,您可以使用行号的差异:

select name, sum(number)
from (select p.*,
             row_number() over (order by number) as seqnum,
             row_number() over (partition by name order by number) as seqnum_1
      from person p
     ) p
group by name, (seqnum - seqnum_1)
order by. min(number);

Why this works is a little tricky to explain.为什么这行得通有点难以解释。 However, it becomes pretty obvious when you look at the results of the subquery.但是,当您查看子查询的结果时,它变得非常明显。 The difference of row numbers is constant on adjacent rows when the name does not change.当名称不变时,相邻行的行号差异是恒定的。

Here is a db<>fiddle. 是一个 db<>fiddle。

I would change the create table statement to the following:我会将 create table 语句更改为以下内容:

CREATE TABLE person(id integer, firstname nvarchar(255), number integer);
  • you need a third column to dertermine the insert order您需要第三列来确定插入顺序
  • I would rename the column name to something like firstname, because name is a keyword in some DBMS.我会将列名重命名为 firstname 之类的名称,因为 name 是某些 DBMS 中的关键字。 This applies also for the column named number.这也适用于名为 number 的列。 Moreover I would change the text type of name to nvarchar, because it is sortable in the group by cause.此外,我会将名称的文本类型更改为 nvarchar,因为它可以按原因在组中排序。

Then you can insert your data:然后你可以插入你的数据:

insert into person values 
(1, 'Leandro', 2),
(2, 'Leandro', 4),
(3, 'Maria',   8),
(4, 'Maria',   16),
(5, 'Jose',    32),
(6, 'Leandro', 64);

After that you can query the data in the following way:之后,您可以通过以下方式查询数据:

SELECT firstname, value FROM (
    SELECT p.id, p.firstname, p.number, LAG(p.firstname) over (ORDER BY p.id) as prevname,
    CASE
        WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
        ELSE number
    END as value
    FROM Person p
) AS temp
WHERE temp.firstname <> temp.prevname OR 
temp.prevname IS NULL
  • First you select the value in the case statement首先你在case语句中选择值
  • Then you filter the data and look at those entries which previous name is not the name of the actual name.然后过滤数据并查看以前名称不是实际名称的条目。

To understand the query better, you can run the subquery on it's own:为了更好地理解查询,您可以单独运行子查询:

SELECT p.id, p.firstname, p.number, LEAD(p.firstname) over (ORDER BY p.id) as nextname, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
    WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
    ELSE number
END as value
FROM Person p

Based on Gordon Linoff's answer ( https://stackoverflow.com/a/64727401/1721672 ), I extracted the inner select as CTE and the following query works pretty well:基于 Gordon Linoff 的回答 ( https://stackoverflow.com/a/64727401/1721672 ),我将内部选择提取为 CTE,以下查询效果很好:

with p(name, number, seqnum, seqnum_1) as
    (select name, number,
        row_number() over (order by number) as seqnum,
        row_number() over (partition by name order by number) as seqnum_1
    from person)
select
    name, sum(number)
from
    p
group by 
    name, (seqnum - seqnum_1)
order by
    min(number);

Producing the expected result:产生预期结果:

Leandro|6
Maria|24
Jose|32
Leandro|64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM