简体   繁体   中英

SQL timestamp difference between rows based on additional comments/columns

I have the following data in a single table (MS SQL Server 2012):

cinderellaID statusName                timestamp
------------ ------------------------- -----------------------
10459        Waiting                   2013-03-16 12:03:17.000
10459        Paired                    2013-03-16 12:29:50.000
10459        Shopping                  2013-03-16 12:29:22.233
10459        Checked Out               2013-03-16 14:01:24.000
10461        Alterations               1988-01-02 01:47:07.000
10461        Checked Out               2013-03-16 14:42:25.000
10461        Paired                    2013-03-16 12:29:31.000
10461        Shopping                  2013-03-16 12:29:01.437
10461        Waiting                   2013-03-16 11:52:18.000
10462        Waiting                   2013-03-16 12:19:35.000
10462        Shopping                  2013-03-16 12:59:01.197
10462        Paired                    2013-03-16 12:59:28.000
10462        Checked Out               2013-03-16 14:35:44.000
10463        Checked Out               2013-03-16 12:22:20.000
10463        Waiting                   2013-03-16 10:44:14.000
10463        Paired                    2013-03-16 11:00:37.000
10463        Shopping                  2013-03-16 11:00:23.063
10464        Waiting                   2013-03-16 10:44:03.000
10464        Paired                    2013-03-16 10:59:32.000
10464        Shopping                  2013-03-16 10:59:02.560
10464        Alterations               1988-01-02 00:44:02.000
10464        Checked Out               2013-03-16 13:18:21.000
10465        Checked Out               2013-03-16 11:54:34.000
10465        Waiting                   2013-03-16 09:44:13.000
10465        Paired                    2013-03-16 10:08:05.000
10465        Shopping                  2013-03-16 10:10:58.323
10466        Waiting                   2013-03-16 12:13:51.000
10466        Shopping                  2013-03-16 12:46:56.207
10466        Paired                    2013-03-16 12:46:43.000
10467        Shopping                  2013-03-16 10:08:06.553
10467        Paired                    2013-03-16 10:04:49.000
10467        Waiting                   2013-03-16 09:41:03.000
<much more data ...>

The data here is presented ordered by cinderellaID, but that's just to make this question easier to understand.

These are transactions showing when a person (identified by cinderellaID) entered each status. For example, in row 1, cinderella 10459 entered the "Waiting" phase at 2013-03-16 12:03:17.000. There is always a flow in the data (or should be). Waiting always transitions to Paired, Paired to Shopping, Shopping to either Checked Out or Alterations. If it goes Shopping -> Alterations, then it would go Alterations -> Checked Out. I know not all of the data is captured, but that's okay with me.

What I want is a way to calculate the average time spent in each phase. For example, how long did everyone spend in "Waiting" before they moved to "Paired"? How long did everyone spend in "Paired" before going to "Shopping"? So my output would ideally look something like (I made the data up):

status        avgTimeSpent
------------- -----------------
Waiting       1:00:04
Paired        0:20:22
Shopping      1:30:11
...

I'm familiar with grouping and what I'd call "plain old SQL" like that, but I'm not as familiar with how to do the kind of row operations I think I need to do in order to solve this. Any help?

Something like this should work:

SELECT
    t1.cinderellaID,
    t1.statusName,
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM        YourTable As t1
INNER JOIN  YourTable As t2
    ON  t1.cinderellaID = t2.cinderellaID
    AND t1.timestamp < t2.timestamp
    AND NOT EXISTS(Select * From YourTable As t3
                   Where t3.cinderellaID = t1.cinderellaID
                     And t3.timestamp < t2.timestamp
                     And t3.timestamp > t1.timestamp)
GROUP BY t1.cinderellaID, t1.statusName

This query should work in any version of SQL. There is a more efficient query that uses the ROW_NUMBER() OVER(..) function, but not all types of SQL support that.

I see you have the SQL-Server-2012 tag, which does support this function, so here it is:

;WITH cte As
(
    SELECT *,
        ROW_NUMBER() OVER(
                        PARTITION BY cinderellaID, statusName 
                        ORDER BY timestamp) As rowNum
    FROM YourTable
)
SELECT
    t1.cinderellaID,
    t1.statusName,
    AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM        cte As t1
INNER JOIN  cte As t2
    ON  t1.cinderellaID = t2.cinderellaID
    AND t1.timestamp < t2.timestamp
    AND t1.rowNum = t2.rowNum-1
GROUP BY t1.cinderellaID, t1.statusName

You can do what you want using lead() . The basic query to get the information you need is:

select t.*,
       lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
       lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t;

Then to get the averages:

select statusname, next_statusname,
       avg(datediff(second, timestamp, next_timestamp)) as avg_seconds
from (select t.*,
             lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
             lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
      from singletable t
     ) t
group by statusname, next_statusname;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM