简体   繁体   English

在T-SQL中查找开始日期和结束日期(基于集合)

[英]Find the start and end date (set based) in T-SQL

I have the below. 我有以下内容。

Name    Date
A   2011-01-01 01:00:00.000
A   2011-02-01 02:00:00.000
A   2011-03-01 03:00:00.000
B   2011-04-01 04:00:00.000
A   2011-05-01 07:00:00.000

The desired output is 期望的输出是

Name       StartDate                        EndDate
-------------------------------------------------------------------
A          2011-01-01 01:00:00.000         2011-04-01 04:00:00.000    
B          2011-04-01 04:00:00.000         2011-05-01 07:00:00.000    
A          2011-05-01 07:00:00.000         NULL

How to achieve the same using TSQL in a set based approach. 如何在基于集合的方法中使用TSQL实现相同的目的。

DDL is as under DDL如下

DECLARE @t TABLE(PersonName VARCHAR(32), [Date] DATETIME) 
INSERT INTO @t VALUES('A', '2011-01-01 01:00:00') 
INSERT INTO @t VALUES('A', '2011-01-02 02:00:00') 
INSERT INTO @t VALUES('A', '2011-01-03 03:00:00') 
INSERT INTO @t VALUES('B', '2011-01-04 04:00:00') 
INSERT INTO @t VALUES('A', '2011-01-05 07:00:00')

Select * from @t
;WITH cte1
     AS (SELECT *,
                ROW_NUMBER() OVER (ORDER BY Date) -
                ROW_NUMBER() OVER (PARTITION BY PersonName
                ORDER BY Date) AS G
         FROM   @t),
     cte2
     AS (SELECT PersonName,
                MIN([Date]) StartDate,
                ROW_NUMBER() OVER (ORDER BY  MIN([Date])) AS rn
         FROM   cte1
         GROUP  BY PersonName,
                   G)
SELECT a.PersonName,
       a.StartDate,
       b.StartDate AS EndDate
FROM   cte2 a
       LEFT JOIN cte2 b
         ON a.rn + 1 = b.rn  

Because the result of CTEs are not generally materialised however you may well find you get better performance if you materialize the intermediate result yourself as below. 由于一般不会实现CTE的结果,但是,如果您自己按以下方式实现中间结果,您可能会发现性能会更好。

DECLARE @t2 TABLE (
  rn         INT IDENTITY(1, 1) PRIMARY KEY,
  PersonName VARCHAR(32),
  StartDate  DATETIME );

INSERT INTO @t2
SELECT PersonName,
       MIN([Date]) StartDate
FROM   (SELECT *,
               ROW_NUMBER() OVER (ORDER BY Date) -
               ROW_NUMBER() OVER (PARTITION BY PersonName
               ORDER BY Date) AS G
        FROM   @t) t
GROUP  BY PersonName,
          G
ORDER  BY StartDate

SELECT a.PersonName,
       a.StartDate,
       b.StartDate AS EndDate
FROM   @t2 a
       LEFT JOIN @t2 b
         ON a.rn + 1 = b.rn 

The other answer with the cte is a good one. CTE的另一个答案是一个很好的答案。 Another option would be to iterate over the collection in any case. 另一种选择是在任何情况下都对集合进行迭代。 It's not set based, but it is another way to do it. 它不是基于设置的,而是另一种实现方法。

You will need to iterate to either A. assign a unique id to each record that corresponds to its transaction, or B. to actually get your output. 您将需要迭代A.为与它的交易相对应的每条记录分配一个唯一的ID,或者B.实际获取输出。

TSQL is not ideal for iterating over records, especially if you have a lot, and so I would recommend some other way of doing it, a small .net program or something that is better at iterating. TSQL不是迭代记录的理想选择,尤其是在您有很多记录的情况下,因此我建议您使用其他方法,例如小型.net程序或迭代效果更好的方法。

Get a row number so you will know where the previous record is. 获取行号,以便您知道上一条记录在哪里。 Then, take a record and the next record after it. 然后,记录并在其后的下一个记录。 When the state changes we have a candidate row. 当状态改变时,我们会有一个候选行。

select 
  state, 
  min(start_timestamp),
  max(end_timestamp)

from
(
    select
        first.state, 
        first.timestamp_ as start_timestamp,
        second.timestamp_ as end_timestamp

        from
        (
            select
                *, row_number() over (order by timestamp_) as id
            from test
        ) as first

        left outer join
        (
            select
                *, row_number() over (order by timestamp_) as id
            from test
        ) as second
        on 
            first.id = second.id - 1 
            and first.state != second.state
) as agg
group by state
    having max(end_timestamp) is not null 

union

-- last row wont have a ending row
--(select state, timestamp_, null from test order by timestamp_ desc limit 1)
    -- I think it something like this for sql server
     (select top state, timestamp_, null from test order by timestamp_ desc)

order by 2
;

Tested with PostgreSQL but should work with SQL Server as well 经过PostgreSQL测试,但也应与SQL Server一起使用

SELECT
  PersonName,
  StartDate = MIN(Date),
  EndDate
FROM (
  SELECT
    PersonName,
    Date,
    EndDate = (
      /* get the earliest date after current date
         associated with a different person */
      SELECT MIN(t1.Date)
      FROM @t AS t1
      WHERE t1.Date > t.Date
        AND t1.PersonName <> t.PersonName
    )
  FROM @t AS t
) s
GROUP BY PersonName, EndDate
ORDER BY 2

Basically, for every Date we find the nearest date after it such that is associated with a different PersonName . 基本上,对于每个Date我们都会在它之后找到与另一个PersonName关联的最近的日期。 That gives us EndDate , which now distinguishes for us consecutive groups of dates for the same person. 这给了我们EndDate ,现在它可以为我们区分同一个人的连续日期组。

Now we only need to group the data by PersonName & EndDate and get the minimal Date in every group as StartDate . 现在,我们只需PersonNameEndDate对数据进行分组,并在每个组中将最小的Date作为StartDate And yes, sort the data by StartDate , of course. 是的,当然可以按StartDate对数据进行排序。

There's a very quick way to do this using a bit of Gaps and Islands theory: 有一些使用缺口和离岛理论的快速方法:

WITH CTE as (SELECT PersonName, [Date]
                   , Row_Number() over (ORDER BY [Date])
                     - Row_Number() over (ORDER BY PersonName, [Date]) as Island
             FROM @t)

Select PersonName, Min([Date]), Max([Date])
from CTE
GROUP BY Island, PersonName
ORDER BY Min([Date])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM