简体   繁体   中英

identify missing rows SQL Server 2008 R2

在此处输入图片说明

I have a table with four columns: MeterName, TenantLease, BuildingID, and Date (last day of month). Each row is unique when looking at all four columns together.

Issue:
If there is not an entry for a month(s) between 11-30-2016 to 10-31-2017 THEN my query result should identify Missing date entries for each MeterName, TenantLease, and BuildingID.

Example1: For Meter1, Tenant1, bld1, I expect to return 9/30/2017 and 10/31/2017 (along with all columns - MeterName, TenantLease, and Buildingid) because those are the missing entries.

Example2: For Meter2, Tenant1, bld1, I expect to return 11/30/2016, 12/31/2016, 4/30/2017, 8/31/2016, 9/30/2017 and 10/31/2017 (along with all columns - MeterName, TenantLease, and Buildingid) because those are the missing entries.

I tried full join, cross apply, outer apply, and etc. However, none would return data that I require.

There are a few steps for this, sorry for the long reply.

First, here is the table variable I made to test this out.

DECLARE @tbl TABLE (
MeterName NVARCHAR(MAX),
TenantLease NVARCHAR(MAX),
BuildingID  NVARCHAR(MAX),
date DATE
)

INSERT @tbl VALUES
('Meter1','Tenant1','bld1','2016-11-30'),('Meter1','Tenant1','bld1','2016-12-31'),
('Meter1','Tenant1','bld1','2017-01-31'),('Meter1','Tenant1','bld1','2017-02-28'),
('Meter1','Tenant1','bld1','2017-03-31'),('Meter1','Tenant1','bld1','2017-04-30'),
('Meter1','Tenant1','bld1','2017-05-31'),('Meter1','Tenant1','bld1','2017-06-30'),
('Meter1','Tenant1','bld1','2017-07-31'),('Meter1','Tenant1','bld1','2017-08-31'),
('Meter2','Tenant1','bld1','2017-01-31'),('Meter2','Tenant1','bld1','2017-02-28'),
('Meter2','Tenant1','bld1','2017-03-31'),('Meter2','Tenant1','bld1','2017-05-31'),
('Meter2','Tenant1','bld1','2017-06-30'),('Meter2','Tenant1','bld1','2017-07-31'),
('Meter1','Tenant2','bld2','2017-01-31'),('Meter1','Tenant2','bld2','2017-02-28'),
('Meter1','Tenant2','bld2','2017-03-31'),('Meter1','Tenant2','bld2','2017-05-31'),
('Meter1','Tenant2','bld2','2017-06-30'),('Meter1','Tenant2','bld2','2017-07-31')

Next, we need to create a dates table to join against to provide the full table you are looking for.

DECLARE @startdate DATE = '2016-11-01'
DECLARE @enddate DATE = '2017-10-31'

DECLARE @dates TABLE (
dates DATE
)

WHILE @startdate <= @enddate
BEGIN

INSERT @dates
VALUES (DATEADD(DAY,-1,DATEADD(MONTH,1,@startdate)))

SET @startdate = DATEADD(MONTH,1,@startdate)

END

Finally, we create the completed result set you are looking for by getting a Cartesian result set of your MeterName, TenantLease, BuildingID, and all dates you are looking for. We join your table against this looking for the records where your table is missing and the Cartesian result exists. These are your missing records.

SELECT
    tbl2.MeterName,
    tbl2.TenantLease,
    tbl2.BuildingID,
    tbl2.dates AS missing_date
FROM @tbl tbl
FULL OUTER JOIN (
    SELECT DISTINCT
        tbl.MeterName,
        tbl.TenantLease,
        tbl.BuildingID,
        dates.dates
    FROM @tbl tbl
    FULL OUTER JOIN @dates dates ON
        1 = 1   --This is always true, resulting in a Cartesion result.
    )tbl2 ON 
    tbl.MeterName = tbl2.MeterName
    AND tbl.TenantLease = tbl2.TenantLease
    AND tbl.BuildingID = tbl2.BuildingID
    AND tbl.date = tbl2.dates
WHERE tbl.MeterName IS NULL

This is one of those case where a tally table or function comes in handy. Notice that this solution only takes a single pass at the table and with the proper index the sort operation caused by the LEAD function can be eliminated.

IF OBJECT_ID('tempdb..#MeterData', 'U') IS NOT NULL 
DROP TABLE #MeterData;

CREATE TABLE #MeterData (
    MeterName CHAR(6) NOT NULL,
    TenantLease CHAR(7) NOT NULL,
    BuildingID CHAR(4) NOT NULL,
    mDate DATE NOT NULL 
    );
INSERT #MeterData(MeterName, TenantLease, BuildingID, mDate) VALUES
    ('Meter1','Tenant1','bld1','2016-11-30'),
    ('Meter1','Tenant1','bld1','2016-12-31'),
    ('Meter1','Tenant1','bld1','2017-01-31'),
    ('Meter1','Tenant1','bld1','2017-02-28'),
    ('Meter1','Tenant1','bld1','2017-03-31'),
    ('Meter1','Tenant1','bld1','2017-04-30'),
    ('Meter1','Tenant1','bld1','2017-05-31'),
    ('Meter1','Tenant1','bld1','2017-06-30'),
    ('Meter1','Tenant1','bld1','2017-07-31'),
    ('Meter1','Tenant1','bld1','2017-08-31'),
    ('Meter2','Tenant1','bld1','2017-01-31'),
    ('Meter2','Tenant1','bld1','2017-02-28'),
    ('Meter2','Tenant1','bld1','2017-03-31'),
    ('Meter2','Tenant1','bld1','2017-05-31'),
    ('Meter2','Tenant1','bld1','2017-06-30'),
    ('Meter2','Tenant1','bld1','2017-07-31'),
    ('Meter1','Tenant2','bld2','2017-01-31'),
    ('Meter1','Tenant2','bld2','2017-02-28'),
    ('Meter1','Tenant2','bld2','2017-03-31'),
    ('Meter1','Tenant2','bld2','2017-05-31'),
    ('Meter1','Tenant2','bld2','2017-06-30'),
    ('Meter1','Tenant2','bld2','2017-07-31');

-- add a non-clustered "POC" index to eliminate the sort operation caused by the LEAD function
CREATE NONCLUSTERED INDEX ix_MeterName_TenantLease_BuildingID_mDate ON #MeterData (MeterName, TenantLease, BuildingID, mDate);

--  SELECT * FROM #MeterData md ORDER BY md.MeterName, md.TenantLease, md.BuildingID, md.mDate;

--=============================================================================================

-- The actual solution 
WITH 
    cte_MonthGap AS (
        SELECT 
            md.MeterName, md.TenantLease, md.BuildingID, md.mDate,
            FirstOfMonth = DATEFROMPARTS(YEAR(md.mDate), MONTH(md.mDate), 1),
            MonthGap = DATEDIFF(MONTH, md.mDate, LEAD(md.mDate, 1, '2017-11-01') OVER (PARTITION BY md.MeterName, md.TenantLease, md.BuildingID ORDER BY md.mDate)) - 1
        FROM
            #MeterData md
        )
SELECT 
    mg.MeterName, mg.TenantLease, mg.BuildingID,
    MissingMonth = EOMONTH(DATEADD(MONTH, t.n, mg.FirstOfMonth))
FROM
    cte_MonthGap mg
    CROSS APPLY dbo.tfn_Tally(mg.MonthGap, 1) t;

Results...

MeterName TenantLease BuildingID MissingMonth
--------- ----------- ---------- ------------
Meter1    Tenant1     bld1       2017-09-30
Meter1    Tenant1     bld1       2017-10-31
Meter1    Tenant2     bld2       2017-04-30
Meter1    Tenant2     bld2       2017-08-31
Meter1    Tenant2     bld2       2017-09-30
Meter1    Tenant2     bld2       2017-10-31
Meter2    Tenant1     bld1       2017-04-30
Meter2    Tenant1     bld1       2017-08-31
Meter2    Tenant1     bld1       2017-09-30
Meter2    Tenant1     bld1       2017-10-31

Code for dbo.tfn_Tally...

CREATE FUNCTION dbo.tfn_Tally
/* ==================================================================================================
Capable of creating a sequense of rows ranging from -10,000,000,000,000,000 to 10,000,000,000,000,000
================================================================================================== */
(
    @NumOfRows BIGINT,
    @StartWith BIGINT 
)
RETURNS TABLE WITH SCHEMABINDING AS 
RETURN
    WITH 
        cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),   -- 10 rows
        cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),                             -- 100 rows
        cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),                             -- 10,000 rows
        cte_n4 (n) AS (SELECT 1 FROM cte_n3 a CROSS JOIN cte_n3 b),                             -- 100,000,000 rows
        cte_Tally (n) AS (
            SELECT TOP (@NumOfRows)
                (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1) + @StartWith
            FROM 
                cte_n4 a CROSS JOIN cte_n4 b                                                    -- 10,000,000,000,000,000 rows
            )
    SELECT 
        t.n
    FROM 
        cte_Tally t;
GO

The idea is to compare the full tenant list with all dates against the actual list and display the difference using the EXCEPT statement.

Note: In order to use a simple tally table, I used STRING_SPLIT function which will only work on MSSQL 2016, otherwise you may use a UDF tally function.

DECLARE @STARTDATE DATE='2016-11-30'
DECLARE @ENDDATE DATE='2017-10-31'
DECLARE @NOOFMONTHS INT=DATEDIFF(MONTH, @STARTDATE, @ENDDATE)

DECLARE @Tenants_Original TABLE (METERNAME VARCHAR(20), TENANTLEASE VARCHAR(20), BUILDINGID  VARCHAR(20), [DATE] DATE)

INSERT @Tenants_Original VALUES
('METER1','TENANT1','BLD1','2016-11-30'),
('METER1','TENANT1','BLD1','2016-12-31'),
('METER1','TENANT1','BLD1','2017-01-31'),
('METER1','TENANT1','BLD1','2017-02-28'),
('METER1','TENANT1','BLD1','2017-03-31'),
('METER1','TENANT1','BLD1','2017-04-30'),
('METER1','TENANT1','BLD1','2017-05-31'),
('METER1','TENANT1','BLD1','2017-06-30'),
('METER1','TENANT1','BLD1','2017-07-31'),
('METER1','TENANT1','BLD1','2017-08-31'),
('METER2','TENANT1','BLD1','2017-01-31'),
('METER2','TENANT1','BLD1','2017-02-28'),
('METER2','TENANT1','BLD1','2017-03-31'),
('METER2','TENANT1','BLD1','2017-05-31'),
('METER2','TENANT1','BLD1','2017-06-30'),
('METER2','TENANT1','BLD1','2017-07-31'),
('METER1','TENANT2','BLD2','2017-01-31'),
('METER1','TENANT2','BLD2','2017-02-28'),
('METER1','TENANT2','BLD2','2017-03-31'),
('METER1','TENANT2','BLD2','2017-05-31'),
('METER1','TENANT2','BLD2','2017-06-30'),
('METER1','TENANT2','BLD2','2017-07-31');


WITH MYDATES AS(
    SELECT EOMONTH( DATEADD(MONTH,-1+ ROW_NUMBER() 
        OVER(ORDER BY (SELECT NULL)), @STARTDATE)) AS [DATE]
    FROM STRING_SPLIT( REPLICATE('1,', @NOOFMONTHS), ',')
), Tenants_Full AS(
    SELECT DISTINCT A.METERNAME, A.TENANTLEASE, A.BUILDINGID, B.[DATE]
    FROM @Tenants_Original AS A
    CROSS APPLY MYDATES AS B
)
SELECT * FROM Tenants_Full
EXCEPT 
SELECT * FROM @Tenants_Original
ORDER BY 1,2,3,4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM