简体   繁体   English

SQL查询过滤器根据其他2列中的值在列中查找DUPLICATES

[英]SQL Query Filter to locate DUPLICATES in a column based on Values in 2 other columns

I am using SQL Server 2014 and I am working with a table named ReservationStay. 我正在使用SQL Server 2014,并且正在使用名为ReservationStay的表。 It contains the records of all guests with their names, arrival dates and departure dates. 它包含所有来宾的记录,包括他们的姓名,抵达日期和离开日期。 An operation has been undertaken that has split the records of hundreds of guests into 2 separate entries, which means that these entries now have the same Guest name but with different arrival dates and departure dates. 已经进行了一项操作,该操作已将数百个来宾的记录分为两个单独的条目,这意味着这些条目现在具有相同的来宾名称,但具有不同的到达日期和离开日期。

An example of an original entry: 原始条目的示例:

 Name         ArrivalDate        DepartureDate
Simon G       2015-06-01          2015-06-08

Here is what happened after that split operation was effected, say, on 2015-06-03: 这是在2015年6月3日执行拆分操作后发生的情况:

 Name         ArrivalDate        DepartureDate
Simon G       2015-06-01          2015-06-03
Simon G       2015-06-03          2015-06-08

This split operation was carried out on several days. 分割操作在几天内进行。

I need a filter in my query that will take into account the following: 我在查询中需要一个过滤器,该过滤器将考虑以下因素:

WHERE Name is a duplicate and the DepartureDate of the first entry = the ArrivalDate of the second entry. WHERE Name是重复项,第一个条目的DepartureDate =第二个条目的ArrivalDate。

Basically, I want to re-construct the original entry. 基本上,我想重新构造原始条目。

How do I write this filter? 如何编写此过滤器?

It can be done with a simple INNER JOIN ("self-join"): 可以使用简单的INNER JOIN (“自我INNER JOIN ”)来完成:

SELECT a.Name, a.ArrivalDate, b.DepartureDate
FROM dtab a 
INNER JOIN dtab b ON b.Name=a.Name
AND b.ArrivalDate=a.DepartureDate

see here http://sqlfiddle.com/#!9/51ea3/2 看到这里http://sqlfiddle.com/#!9/51ea3/2

I added a few more lines to the table to have an example that will not fulfill the condition: 我在表中增加了几行以得到一个不满足该条件的示例:

CREATE TABLE dtab (Name varchar(11),ArrivalDate varchar(10),DepartureDate varchar(10));
INSERT INTO dtab (Name,ArrivalDate,DepartureDate)
VALUES
('Simon G', '2015-06-01', '2015-06-03'),
('Simon G', '2015-06-03', '2015-06-08'),
('Peter M', '2015-03-07', '2015-03-15'),
('Peter M', '2015-05-05', '2015-05-10');

and get the desired result 并获得理想的结果

|    Name | ArrivalDate | DepartureDate |
|---------|-------------|---------------|
| Simon G |  2015-06-01 |    2015-06-08 |

Edit 编辑

Just noticed, that in order to reconstruct the unsplit table you will also need to list also the entries that have not been split before. 刚刚注意到,为了重建拆分的表,您还需要列出之前尚未拆分的条目。 To get that you could do the following: 为此,您可以执行以下操作:

WITH combined AS (
 SELECT a.Name cnam, a.ArrivalDate carr, b.DepartureDate cdep
 FROM dtab a 
 INNER JOIN dtab b ON b.Name=a.Name
 AND b.ArrivalDate=a.DepartureDate
)
SELECT d.* FROM dtab d
LEFT JOIN combined ON cnam=Name AND (carr=ArrivalDate OR cdep=DepartureDate)
WHERE cdep IS NULL
UNION ALL 
SELECT * FROM combined

I put the original SELECT statement into a common table expression ( combined ) and used it to check on the original table whether arrival or departure dates of any of thoses entries coincide. 我将原始的SELECT语句放入一个公共表表达式中( combined ),并用它在原始表上检查这些条目中任何条目的到达或离开日期是否一致。 If they do, the original entries will not be listed, otherwise they will be listed in UNION with the entries of the combined table. 如果这样的话,原始条目将不会列出,否则它们将在UNIONcombined表的条目一起列出。

Now we get 现在我们得到

|    Name | ArrivalDate | DepartureDate |
|---------|-------------|---------------|
| Peter M |  2015-03-07 |    2015-03-15 |
| Peter M |  2015-05-05 |    2015-05-10 |
| Simon G |  2015-06-01 |    2015-06-08 |

see here http://sqlfiddle.com/#!6/7d325/5 看到这里http://sqlfiddle.com/#!6/7d325/5

This solution will work as of SQL Server 2005 ( LAG / LEAP were introduced in SQL Server 2012). 此解决方案将从SQL Server 2005开始运行(SQL Server 2012中引入了LAG / LEAP )。

    declare @t table (name varchar(10),Arrivaldate varchar(20),Departure varchar(20)) 
    insert into @t(name,Arrivaldate,Departure)
values 
('Simon G','2015-06-01','2015-06-03'),
('Simon G','2015-06-03','2015-06-08')

    Select A.name,A.Arrivaldate,A.Departure from (
    select NAME,MIN(Arrivaldate)Arrivaldate,MAX(Departure)Departure from @t GROUP BY NAME)A

You can use LEAD , LAG window functions in order to locate records that have been split: 您可以使用LEADLAG窗口函数来查找已拆分的记录:

SELECT Name, MIN(ArrivalDate) AS ArrivalDate, MAX(DepartureDate) AS DepartureDate
FROM (
SELECT Name, ArrivalDate, DepartureDate, 
       CASE 
          WHEN ArrivalDate = LAG(DepartureDate) OVER (PARTITION BY Name 
                                                      ORDER BY ArrivalDate) 
               OR                                                     

               DepartureDate = LEAD(ArrivalDate) OVER (PARTITION BY Name 
                                                       ORDER BY ArrivalDate)
          THEN 1
          ELSE 0
       END AS HasBeenSplit                                                       
FROM mytable ) t
GROUP BY Name, HasBeenSplit

This query will give you back the original version of your table. 该查询将为您提供表的原始版本。

Demo here 在这里演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM