[英]Aggregating consecutive rows in SQL
Given the sql table (I'm using SQLite3):鉴于 sql 表(我使用的是 SQLite3):
CREATE TABLE person(name text, number integer);
And filling with the values:并填充值:
insert into person values
('Leandro', 2),
('Leandro', 4),
('Maria', 8),
('Maria', 16),
('Jose', 32),
('Leandro', 64);
What I want is to get the sum of the number
column, but only for consecutive rows, so that I can the result, that maintain the original insertion order:我想要的是获取number
列的总和,但仅限于连续行,以便我可以得到保持原始插入顺序的结果:
Leandro|6
Maria|24
Jose|32
Leandro|64
The "closest" I got so far is:到目前为止我得到的“最接近”是:
select name, sum(number) over(partition by name) from person order by rowid;
But it clearly shows I'm far from understanding SQL, as the most important features (grouping and summation of consecutive rows) is missing, but at least the order is there :-):但它清楚地表明我对 SQL 的理解还很远,因为缺少最重要的功能(连续行的分组和求和),但至少顺序是:-):
Leandro|70
Leandro|70
Maria|24
Maria|24
Jose|32
Leandro|70
Preferably the answer should not require creation of temporary tables, as the output is expected to always have the same order of how the data was inserted.最好答案不应该要求创建临时表,因为预计输出的顺序总是与数据插入的顺序相同。
You can do it with window functions:您可以使用窗口函数来做到这一点:
and then group by the groups and aggregate:然后按组分组并聚合:
select name, sum(number) total
from (
select *, sum(flag) over (order by rowid) grp
from (
select *, rowid, name <> lag(name, 1, '') over (order by rowid) flag
from person
)
)
group by grp
See the demo .请参阅演示。
Results:结果:
> name | total
> :------ | ----:
> Leandro | 6
> Maria | 24
> Jose | 32
> Leandro | 64
This is a type of gaps-and-islands problem.这是一种间隙和岛屿问题。 You can use the difference of row numbers for this purpose:为此,您可以使用行号的差异:
select name, sum(number)
from (select p.*,
row_number() over (order by number) as seqnum,
row_number() over (partition by name order by number) as seqnum_1
from person p
) p
group by name, (seqnum - seqnum_1)
order by. min(number);
Why this works is a little tricky to explain.为什么这行得通有点难以解释。 However, it becomes pretty obvious when you look at the results of the subquery.但是,当您查看子查询的结果时,它变得非常明显。 The difference of row numbers is constant on adjacent rows when the name does not change.当名称不变时,相邻行的行号差异是恒定的。
I would change the create table statement to the following:我会将 create table 语句更改为以下内容:
CREATE TABLE person(id integer, firstname nvarchar(255), number integer);
Then you can insert your data:然后你可以插入你的数据:
insert into person values
(1, 'Leandro', 2),
(2, 'Leandro', 4),
(3, 'Maria', 8),
(4, 'Maria', 16),
(5, 'Jose', 32),
(6, 'Leandro', 64);
After that you can query the data in the following way:之后,您可以通过以下方式查询数据:
SELECT firstname, value FROM (
SELECT p.id, p.firstname, p.number, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
ELSE number
END as value
FROM Person p
) AS temp
WHERE temp.firstname <> temp.prevname OR
temp.prevname IS NULL
To understand the query better, you can run the subquery on it's own:为了更好地理解查询,您可以单独运行子查询:
SELECT p.id, p.firstname, p.number, LEAD(p.firstname) over (ORDER BY p.id) as nextname, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
ELSE number
END as value
FROM Person p
Based on Gordon Linoff's answer ( https://stackoverflow.com/a/64727401/1721672 ), I extracted the inner select as CTE and the following query works pretty well:基于 Gordon Linoff 的回答 ( https://stackoverflow.com/a/64727401/1721672 ),我将内部选择提取为 CTE,以下查询效果很好:
with p(name, number, seqnum, seqnum_1) as
(select name, number,
row_number() over (order by number) as seqnum,
row_number() over (partition by name order by number) as seqnum_1
from person)
select
name, sum(number)
from
p
group by
name, (seqnum - seqnum_1)
order by
min(number);
Producing the expected result:产生预期结果:
Leandro|6
Maria|24
Jose|32
Leandro|64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.