简体   繁体   中英

How to find the average time difference between rows in a table?

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.

I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?

If your table is t, and your timestamp column is ts, and you want the answer in seconds:

SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) ) 
       /
       (COUNT(DISTINCT(ts)) -1) 
FROM t

This will be miles quicker for large tables as it has no n-squared JOIN

This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.

Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.

Then we just condition the timestamps to be distinct.

Are the ID's contiguous ?

You could do something like,

SELECT 
      a.ID
      , b.ID
      , a.Timestamp 
      , b.Timestamp 
      , b.timestamp - a.timestamp as Difference
FROM
     MyTable a
     JOIN MyTable b
          ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp

That'll give you a list of time differences on each consecutive row pair...

Then you could wrap that up in an AVG grouping...

Here's one way:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on cur.id = prev.id + 1 
    and cur.datecol <> prev.datecol

The timestampdiff function allows you to choose between days, months, seconds, and so on.

If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on prev.datecol < cur.datecol
    and not exists (
        select * 
        from table inbetween 
        where prev.datecol < inbetween.datecol
        and inbetween.datecol < cur.datecol)
    )

Adapted for SQL Server from this discussion.

Essential columns used are: cmis_load_date: A date/time stamp associated with each record. extract_file: The full path to a file from which the record was loaded.

Comments: There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.

SELECT 
AVG(DATEDIFF(day,  t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT 
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,  
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory 
WHERE 
court_id = 17
and 
cmis_load_date >= '2019-09-01'
GROUP BY 
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN 
(
SELECT 
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,  
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory 
WHERE 
court_id = 17
and 
cmis_load_date >= '2019-09-01'
GROUP BY 
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber

OLD POST but ....

Easies way is to use the Lag function and TIMESTAMPDIFF

SELECT 
   id, 
   TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
   SELECT 
     id, 
     TIMESTAMP, 
     LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
   FROM TABLE_NAME
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM