![](/img/trans.png)
[英]Find longest increasing sequence in groups of consecutive dates using SQL
[英]Find the longest sequence of consecutive increasing numbers in SQL
对于这个例子,我说我有一个包含两个字段的表, AREA varchar(30)
和OrderNumber INT
。
该表具有以下数据
AREA | OrderNumber
Fontana | 32
Fontana | 42
Fontana | 76
Fontana | 12
Fontana | 3
Fontana | 99
RC | 32
RC | 1
RC | 8
RC | 9
RC | 4
我想回来
我想要返回的结果是每个区域增加连续值的最长长度。 对于Fontana it is 3 (32, 42, 76)
。 For RC it is 2 (8,9)
AREA | LongestLength
Fontana | 3
RC | 2
我如何在MS Sql 2005上执行此操作?
一种方法是使用跨越每一行的递归CTE。 如果行符合条件(增加相同区域的订单号),则将链长增加1。 如果没有,你开始一个新的链:
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1
)
, recurse as
(
select rn
, area
, OrderNumber
, 1 as ChainLength
from numbered
where rn = 1
union all
select cur.rn
, cur.area
, cur.OrderNumber
, case
when cur.area = prev.area
and cur.OrderNumber > prev.OrderNumber
then prev.ChainLength + 1
else 1
end
from recurse prev
join numbered cur
on prev.rn + 1 = cur.rn
)
select area
, max(ChainLength)
from recurse
group by
area
另一种方法是使用查询来查找“中断”,即结束同一区域的递增顺序号序列的行。 中断之间的行数是长度。
; with numbered as
(
select row_number() over (order by area, eventtime) rn
, *
from Table1 t1
)
-- Select rows that break an increasing chain
, breaks as
(
select row_number() over (order by cur.rn) rn2
, cur.rn
, cur.Area
from numbered cur
left join
numbered prev
on cur.rn = prev.rn + 1
where cur.OrderNumber <= prev.OrderNumber
or cur.Area <> prev.Area
or prev.Area is null
)
-- Add a final break after the last row
, breaks2 as
(
select *
from breaks
union all
select count(*) + 1
, max(rn) + 1
, null
from breaks
)
select series_start.area
, max(series_end.rn - series_start.rn)
from breaks2 series_start
join breaks2 series_end
on series_end.rn2 = series_start.rn2 + 1
group by
series_start.area
您可以通过ROW_NUMBER()
进行一些数学计算,找出连续项目的位置。
这是代码示例:
;WITH rownums AS
(
SELECT [area],
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1,
ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
FROM SomeTable
),
differences AS
(
SELECT [area],
[calc] = rid1 - rid2
FROM rownums
),
summation AS
(
SELECT [area], [calc], COUNT(*) AS lengths
FROM differences
GROUP BY [area], [calc]
)
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
ON differences.[calc] = summation.[calc]
AND differences.area = calc.area
GROUP BY [area]
因此,如果我按照我的订单编号排序一组行号,而按我的事件时间排序另一组行号,那么这两个数字之间的差异将始终相同,只要它们的顺序相同即可。
然后,您可以获得按这些差异分组的计数,然后拉出最大计数以获得所需的数量。
编辑:...忽略第一次编辑,我得到的冲动。
你不解释为什么RC的最长序列不包括1而Fontana的确包括32.我认为1被排除,因为它是一个减少:它来自32.然而,Fontana的32是有史以来的第一个项目。小组,我有两个想法如何解释为什么它被认为是增加。 这或者正是因为它是该组的第一个项目,或者因为它也是正面的(即好像是在0之后,因此增加)。
出于这个答案的目的,我假设后者,即如果组是第一项,那么如果它是正数则增加。 以下脚本实现了以下想法:
枚举每行中的AREA
的顺序组eventtime
列你几乎忘了提及。
将枚举集加入到自身中,将每行与其前一行链接起来。
获取行与其前一个值之间差异的符号(将后者默认为0)。 在这一点上,问题变成了一个间隙和岛屿问题。
按照#3中确定的符号对每个AREA
组进行分区,并枚举每个子组的行。
找出#1中的行号和#4中的行号之间的差异。 这将是识别个别条纹的标准(与AREA
一起)。
最后,按AREA
对结果进行分组,#3的符号和#5的结果对行进行分组,计算每个AREA
的最大计数。
我实现了以上这样的:
WITH enumerated AS (
SELECT
*,
row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
FROM atable
),
signed AS (
SELECT
this.eventtime,
this.AREA,
this.row,
sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
FROM enumerated AS this
LEFT JOIN enumerated AS last
ON this.AREA = last.AREA
AND this.row = last.row + 1
),
partitioned AS (
SELECT
AREA,
sgn,
grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
FROM signed
)
SELECT DISTINCT
AREA,
LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
FROM partitioned
WHERE sgn = 1
GROUP BY
AREA,
grp
;
可在此处找到SQL Fiddle演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.