I have the following data in a single table (MS SQL Server 2012):
cinderellaID statusName timestamp
------------ ------------------------- -----------------------
10459 Waiting 2013-03-16 12:03:17.000
10459 Paired 2013-03-16 12:29:50.000
10459 Shopping 2013-03-16 12:29:22.233
10459 Checked Out 2013-03-16 14:01:24.000
10461 Alterations 1988-01-02 01:47:07.000
10461 Checked Out 2013-03-16 14:42:25.000
10461 Paired 2013-03-16 12:29:31.000
10461 Shopping 2013-03-16 12:29:01.437
10461 Waiting 2013-03-16 11:52:18.000
10462 Waiting 2013-03-16 12:19:35.000
10462 Shopping 2013-03-16 12:59:01.197
10462 Paired 2013-03-16 12:59:28.000
10462 Checked Out 2013-03-16 14:35:44.000
10463 Checked Out 2013-03-16 12:22:20.000
10463 Waiting 2013-03-16 10:44:14.000
10463 Paired 2013-03-16 11:00:37.000
10463 Shopping 2013-03-16 11:00:23.063
10464 Waiting 2013-03-16 10:44:03.000
10464 Paired 2013-03-16 10:59:32.000
10464 Shopping 2013-03-16 10:59:02.560
10464 Alterations 1988-01-02 00:44:02.000
10464 Checked Out 2013-03-16 13:18:21.000
10465 Checked Out 2013-03-16 11:54:34.000
10465 Waiting 2013-03-16 09:44:13.000
10465 Paired 2013-03-16 10:08:05.000
10465 Shopping 2013-03-16 10:10:58.323
10466 Waiting 2013-03-16 12:13:51.000
10466 Shopping 2013-03-16 12:46:56.207
10466 Paired 2013-03-16 12:46:43.000
10467 Shopping 2013-03-16 10:08:06.553
10467 Paired 2013-03-16 10:04:49.000
10467 Waiting 2013-03-16 09:41:03.000
<much more data ...>
The data here is presented ordered by cinderellaID, but that's just to make this question easier to understand.
These are transactions showing when a person (identified by cinderellaID) entered each status. For example, in row 1, cinderella 10459 entered the "Waiting" phase at 2013-03-16 12:03:17.000. There is always a flow in the data (or should be). Waiting always transitions to Paired, Paired to Shopping, Shopping to either Checked Out or Alterations. If it goes Shopping -> Alterations, then it would go Alterations -> Checked Out. I know not all of the data is captured, but that's okay with me.
What I want is a way to calculate the average time spent in each phase. For example, how long did everyone spend in "Waiting" before they moved to "Paired"? How long did everyone spend in "Paired" before going to "Shopping"? So my output would ideally look something like (I made the data up):
status avgTimeSpent
------------- -----------------
Waiting 1:00:04
Paired 0:20:22
Shopping 1:30:11
...
I'm familiar with grouping and what I'd call "plain old SQL" like that, but I'm not as familiar with how to do the kind of row operations I think I need to do in order to solve this. Any help?
Something like this should work:
SELECT
t1.cinderellaID,
t1.statusName,
AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM YourTable As t1
INNER JOIN YourTable As t2
ON t1.cinderellaID = t2.cinderellaID
AND t1.timestamp < t2.timestamp
AND NOT EXISTS(Select * From YourTable As t3
Where t3.cinderellaID = t1.cinderellaID
And t3.timestamp < t2.timestamp
And t3.timestamp > t1.timestamp)
GROUP BY t1.cinderellaID, t1.statusName
This query should work in any version of SQL. There is a more efficient query that uses the ROW_NUMBER() OVER(..)
function, but not all types of SQL support that.
I see you have the SQL-Server-2012 tag, which does support this function, so here it is:
;WITH cte As
(
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY cinderellaID, statusName
ORDER BY timestamp) As rowNum
FROM YourTable
)
SELECT
t1.cinderellaID,
t1.statusName,
AVG(DATEDIFF(second, t1.timestamp, t2.timestamp)) As AvgTime
FROM cte As t1
INNER JOIN cte As t2
ON t1.cinderellaID = t2.cinderellaID
AND t1.timestamp < t2.timestamp
AND t1.rowNum = t2.rowNum-1
GROUP BY t1.cinderellaID, t1.statusName
You can do what you want using lead()
. The basic query to get the information you need is:
select t.*,
lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t;
Then to get the averages:
select statusname, next_statusname,
avg(datediff(second, timestamp, next_timestamp)) as avg_seconds
from (select t.*,
lead(statusname) over (partition by cinderellaID order by timestamp) as next_statusname,
lead(timestamp) over (partition by cinderellaID order by timestamp) as next_timestamp
from singletable t
) t
group by statusname, next_statusname;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.