简体   繁体   English

列出连续记录范围的有效方法

[英]Efficient way to list ranges of consecutive records

I have a table set up like so:我有一张这样的桌子:

CREATE TABLE `cn` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `type` int(3) unsigned NOT NULL,
    `number` int(10) NOT NULL,
    `desc` varchar(64) NOT NULL,
    `datetime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB

number is usually but not necessarily unique. number通常但不一定是唯一的。

Most of the table consists of rows with consecutive number entries.该表的大部分由具有连续number条目的行组成。

eg例如

101010, 101011, 101012, etc. 101010、101011、101012等

I've been trying to find an efficient way to list ranges of consecutive numbers so I can find out where numbers are "missing" easily.我一直在尝试找到一种列出连续数字范围的有效方法,以便我可以轻松找出数字“丢失”的位置。 What I'd like to do is list the start number, end number, and number of consecutive rows.我想做的是列出开始编号、结束编号和连续行数。 Since there can be duplicates, I am using SELECT DISTINCT(number) to avoid duplicates.由于可能有重复,我使用SELECT DISTINCT(number)来避免重复。

I've not been having much luck - most of the questions of this type deal with dates and have been hard to generalize.我运气不太好——大多数这类问题都与日期有关,而且很难一概而论。 One query was executing forever, so that was a no go.一个查询永远在执行,所以没有 go。 This answer is sort of close but not quite.这个答案有点接近但不完全。 It uses a CROSS JOIN , which sounds like a recipe for disaster when you have millions of records.它使用CROSS JOIN ,当您拥有数百万条记录时,这听起来像是灾难的秘诀。

What would the best way to do this be?最好的方法是什么? Some answers use joins, which I'm skeptical of performance wise.一些答案使用连接,我对性能表示怀疑。 Right now there are only 50,000 rows, but it will be millions of records within a few days, and so every ounce of performance matters.目前只有 50,000 行,但几天内将有数百万条记录,因此每一盎司的性能都很重要。

The eventual pseudoquery I have in mind is something like:我想到的最终伪查询类似于:

SELECT DISTINCT(number) FROM cn WHERE type = 1 GROUP BY [consecutive...] ORDER BY number ASC

This is a gaps-and-islands problem.这是一个差距和孤岛问题。 You can solve by using the difference between row_number() and number to define groups;您可以通过使用row_number()number之间的差异来定义组来解决; gaps are identified by changes in the difference:差距通过差异的变化来识别:

select type, min(number) first_number, max(number) last_number, count(*) no_records
from (
    select cn.*, row_number() over(order by number) rn
    from cn
    where type = 1
) c
group by type, number - rn

Note: window functions avalailable in MySQL 8.0 and MariaDB 10.3 onwards.注意:window 函数在 MySQL 8.0 和 MariaDB 10.3 及更高版本中可用。


In earlier versions, you can emulate row_number() with a session variable:在早期版本中,您可以使用 session 变量模拟row_number()

select type, min(number) first_number, max(number) last_number, count(*) no_records
from (
    select c.*, @rn := @rn + 1 rn
    from (select * from cn where type = 1 order by number) c
    cross join (select @rn := 0) r
) c
group by number - rn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM