I want to find all missing rows between 2 numbers in SQL Server.
For example I have a range 100,000
to 200,000
and a [INVOICE_NO]
column in a table in SQL Server. There should be a row for every number between 100,000
and 200,000
. How can I check and find missing invoice numbers in the table?
I understand how to do it if every number between 100,000
and 200,000
was stored in a seperate table then I could just do and not in (select ...)
but not sure how to do it without.
As commented, i would use a tally table here, not a rCTE:
DECLARE @Start int, @End int;
SET @Start = 100000;
SET @End = 200000;
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
FROM N N1, N N2, N N3, N N4, N N5, N N6) --1M rows should be enough
SELECT T.I AS MISSING_INVOICE_NO
FROM Tally T
--LEFT JOIN YourTable YT ON T.I = YT.INVOICE_NO
WHERE T.I BETWEEN @Start AND @End
--AND YT.INVOICE_NO IS NULL
You'd need to comment out and adjust the lines to JOIN
to your table.
As a proof of reasoning, take the below scripts:
DECLARE @Start int, @End int;
SET @Start = 100000;
SET @End = 200000;
SET STATISTICS TIME ON;
PRINT N'Tally Table';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
FROM N N1, N N2, N N3, N N4, N N5, N N6) --1M rows should be enough
SELECT *
FROM Tally T
WHERE T.I BETWEEN @Start AND @End;
PRINT N'rCTe';
WITH rCTE AS(
SELECT @Start AS I
UNION ALL
SELECT I + 1
FROM rCTE r
WHERE r.I + 1 <= @End)
SELECT *
FROM rCTE
OPTION (MAXRECURSION 0);
SET STATISTICS TIME OFF;
The time taken to complete the Tally table (on my Production instance):
CPU time = 78 ms, elapsed time = 106 ms.
CPU time = 78 ms, elapsed time = 95 ms.
CPU time = 62 ms, elapsed time = 91 ms.
CPU time = 78 ms, elapsed time = 105 ms.
The rCTE method, however, was:
CPU time = 2547 ms, elapsed time = 3695 ms.
CPU time = 2250 ms, elapsed time = 2500 ms.
CPU time = 1813 ms, elapsed time = 1930 ms.
CPU time = 2750 ms, elapsed time = 3220 ms.
That's a big difference in time, as the tally solution was averaging about 100ms, where as the rCTe was between 2 and 4 seconds.
If you can deal with ranges, you can get all gaps with:
select (invoice_no + 1) as first_missing_invoice_no,
(next_invoice_no - 1) as last_missing_invoice_no,
count(*) as num_missing
from (select i.*,
lead(invoice_no) over (order by invoice_no) as next_invoice_no
from invoices i
) i
where next_invoice_no <> invoice_no + 1;
This can be adopted to a particular range, if ranges meet your needs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.