简体   繁体   中英

Find missing rows between 2 numbers in SQL Server

I want to find all missing rows between 2 numbers in SQL Server.

For example I have a range 100,000 to 200,000 and a [INVOICE_NO] column in a table in SQL Server. There should be a row for every number between 100,000 and 200,000 . How can I check and find missing invoice numbers in the table?

I understand how to do it if every number between 100,000 and 200,000 was stored in a seperate table then I could just do and not in (select ...) but not sure how to do it without.

As commented, i would use a tally table here, not a rCTE:

DECLARE @Start int, @End int;
SET @Start = 100000;
SET @End = 200000;

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
    FROM N N1, N N2, N N3, N N4, N N5, N N6) --1M rows should be enough
SELECT T.I AS MISSING_INVOICE_NO
FROM Tally T
     --LEFT JOIN YourTable YT ON T.I = YT.INVOICE_NO
WHERE T.I BETWEEN @Start AND @End
--AND YT.INVOICE_NO IS NULL

You'd need to comment out and adjust the lines to JOIN to your table.

As a proof of reasoning, take the below scripts:

DECLARE @Start int, @End int;
SET @Start = 100000;
SET @End = 200000;

SET STATISTICS TIME ON;

PRINT N'Tally Table';

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
    FROM N N1, N N2, N N3, N N4, N N5, N N6) --1M rows should be enough
SELECT *
FROM Tally T
WHERE T.I BETWEEN @Start AND @End;

PRINT N'rCTe';

WITH rCTE AS(
    SELECT @Start AS I
    UNION ALL
    SELECT I + 1
    FROM rCTE r
    WHERE r.I + 1 <= @End)
SELECT *
FROM rCTE
OPTION (MAXRECURSION 0);

SET STATISTICS TIME OFF;

The time taken to complete the Tally table (on my Production instance):

CPU time = 78 ms,  elapsed time = 106 ms.
CPU time = 78 ms,  elapsed time = 95 ms.
CPU time = 62 ms,  elapsed time = 91 ms.
CPU time = 78 ms,  elapsed time = 105 ms.

The rCTE method, however, was:

CPU time = 2547 ms,  elapsed time = 3695 ms.
CPU time = 2250 ms,  elapsed time = 2500 ms.
CPU time = 1813 ms,  elapsed time = 1930 ms.
CPU time = 2750 ms,  elapsed time = 3220 ms.

That's a big difference in time, as the tally solution was averaging about 100ms, where as the rCTe was between 2 and 4 seconds.

If you can deal with ranges, you can get all gaps with:

select (invoice_no + 1) as first_missing_invoice_no, 
       (next_invoice_no - 1) as last_missing_invoice_no,
       count(*) as num_missing
from (select i.*,
             lead(invoice_no) over (order by invoice_no) as next_invoice_no
      from invoices i
     ) i
where next_invoice_no <> invoice_no + 1;

This can be adopted to a particular range, if ranges meet your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM