简体   繁体   English

在一个时间范围内分组为 5 分钟的间隔

[英]Grouping into interval of 5 minutes within a time range

I have some difficulties with mySQL commands that I want to do.我对我想做的 mySQL 命令有一些困难。

SELECT a.timestamp, name, count(b.name) 
FROM time a, id b 
WHERE a.user = b.user
  AND a.id = b.id
  AND b.name = 'John'
  AND a.timestamp BETWEEN '2010-11-16 10:30:00' AND '2010-11-16 11:00:00' 
GROUP BY a.timestamp

This is my current output statement.这是我当前的 output 声明。

timestamp            name  count(b.name)
-------------------  ----  -------------
2010-11-16 10:32:22  John  2
2010-11-16 10:35:12  John  7
2010-11-16 10:36:34  John  1
2010-11-16 10:37:45  John  2
2010-11-16 10:48:26  John  8
2010-11-16 10:55:00  John  9
2010-11-16 10:58:08  John  2

How do I group them into 5 minutes interval results?如何将它们分组为 5 分钟间隔结果?

I want my output to be like我希望我的 output 像

timestamp            name  count(b.name)
-------------------  ----  -------------
2010-11-16 10:30:00  John  2
2010-11-16 10:35:00  John  10
2010-11-16 10:40:00  John  0
2010-11-16 10:45:00  John  8
2010-11-16 10:50:00  John  0
2010-11-16 10:55:00  John  11 

This works with every interval.这适用于每个间隔。

PostgreSQL PostgreSQL

SELECT
    TIMESTAMP WITH TIME ZONE 'epoch' +
    INTERVAL '1 second' * round(extract('epoch' from timestamp) / 300) * 300 as timestamp,
    name,
    count(b.name)
FROM time a, id 
WHERE …
GROUP BY 
round(extract('epoch' from timestamp) / 300), name


MySQL MySQL

SELECT
    timestamp,  -- not sure about that
    name,
    count(b.name)
FROM time a, id 
WHERE …
GROUP BY 
UNIX_TIMESTAMP(timestamp) DIV 300, name

I came across the same issue.我遇到了同样的问题。

I found that it is easy to group by any minute interval is just dividing epoch by minutes in amount of seconds and then either rounding or using floor to get ride of the remainder.我发现按任何分钟间隔分组很容易,只需将纪元除以秒数的分钟,然后四舍五入或使用地板来获得剩余部分。 So if you want to get interval in 5 minutes you would use 300 seconds .因此,如果您想在5 分钟内获得间隔,您将使用300 秒

    SELECT COUNT(*) cnt, 
    to_timestamp(floor((extract('epoch' from timestamp_column) / 300 )) * 300) 
    AT TIME ZONE 'UTC' as interval_alias
    FROM TABLE_NAME GROUP BY interval_alias
interval_alias       cnt
-------------------  ----  
2010-11-16 10:30:00  2
2010-11-16 10:35:00  10
2010-11-16 10:45:00  8
2010-11-16 10:55:00  11 

This will return the data correctly group by the selected minutes interval;这将按选定的分钟间隔正确返回数据; however, it will not return the intervals that don't contains any data.但是,它不会返回不包含任何数据的区间。 In order to get those empty intervals we can use the function generate_series .为了获得这些空区间,我们可以使用函数generate_series

    SELECT generate_series(MIN(date_trunc('hour',timestamp_column)),
    max(date_trunc('minute',timestamp_column)),'5m') as interval_alias FROM 
    TABLE_NAME

Result:结果:

interval_alias       
-------------------    
2010-11-16 10:30:00  
2010-11-16 10:35:00
2010-11-16 10:40:00   
2010-11-16 10:45:00
2010-11-16 10:50:00   
2010-11-16 10:55:00   

Now to get the result with interval with zero occurrences we just outer join both result sets .现在要获得间隔为零的结果,我们只需外连接两个结果集

    SELECT series.minute as interval,  coalesce(cnt.amnt,0) as count from 
       (
       SELECT count(*) amnt,
       to_timestamp(floor((extract('epoch' from timestamp_column) / 300 )) * 300)
       AT TIME ZONE 'UTC' as interval_alias
       from TABLE_NAME  group by interval_alias
       ) cnt
    
    RIGHT JOIN 
       (    
       SELECT generate_series(min(date_trunc('hour',timestamp_column)),
       max(date_trunc('minute',timestamp_column)),'5m') as minute from TABLE_NAME 
       ) series
  on series.minute = cnt.interval_alias

The end result will include the series with all 5 minute intervals even those that have no values.最终结果将包括所有 5 分钟间隔的系列,即使是那些没有值的系列。

interval             count
-------------------  ----  
2010-11-16 10:30:00  2
2010-11-16 10:35:00  10
2010-11-16 10:40:00  0
2010-11-16 10:45:00  8
2010-11-16 10:50:00  0 
2010-11-16 10:55:00  11 

The interval can be easily changed by adjusting the last parameter of generate_series.通过调整 generate_series 的最后一个参数可以很容易地改变间隔。 In our case we use '5m' but it could be any interval we want.在我们的例子中,我们使用“5m”,但它可以是我们想要的任何间隔

您应该使用GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 300而不是 round(../300) 因为四舍五入我发现一些记录被计入两个分组的结果集。

For postgres , I found it easier and more accurate to use the对于postgres ,我发现使用

date_trunc date_trunc

function, like:功能,如:

select name, sum(count), date_trunc('minute',timestamp) as timestamp
FROM table
WHERE xxx
GROUP BY name,date_trunc('minute',timestamp)
ORDER BY timestamp

You can provide various resolutions like 'minute','hour','day' etc... to date_trunc.您可以向 date_trunc 提供各种分辨率,例如“分钟”、“小时”、“天”等。

The query will be something like:查询将类似于:

SELECT 
  DATE_FORMAT(
    MIN(timestamp),
    '%d/%m/%Y %H:%i:00'
  ) AS tmstamp,
  name,
  COUNT(id) AS cnt 
FROM
  table
GROUP BY ROUND(UNIX_TIMESTAMP(timestamp) / 300), name

Not sure if you still need it.不确定你是否还需要它。

SELECT FROM_UNIXTIME(FLOOR((UNIX_TIMESTAMP(timestamp))/300)*300) AS t,timestamp,count(1) as c from users GROUP BY t ORDER BY t;

2016-10-29 19:35:00 | 2016-10-29 19:35:00 | 2016-10-29 19:35:50 | 2016-10-29 19:35:50 | 4 | 4 |

2016-10-29 19:40:00 | 2016-10-29 19:40:00 | 2016-10-29 19:40:37 | 2016-10-29 19:40:37 | 5 | 5 |

2016-10-29 19:45:00 | 2016-10-29 19:45:00 | 2016-10-29 19:45:09 | 2016-10-29 19:45:09 | 6 | 6 |

2016-10-29 19:50:00 | 2016-10-29 19:50:00 | 2016-10-29 19:51:14 | 2016-10-29 19:51:14 | 4 | 4 |

2016-10-29 19:55:00 | 2016-10-29 19:55:00 | 2016-10-29 19:56:17 | 2016-10-29 19:56:17 | 1 | 1 |

You're probably going to have to break up your timestamp into ymd:HM and use DIV 5 to split the minutes up into 5-minute bins -- something like您可能不得不将时间戳分解为 ymd:HM 并使用 DIV 5 将分钟分成 5 分钟的垃圾箱——类似于

select year(a.timestamp), 
       month(a.timestamp), 
       hour(a.timestamp), 
       minute(a.timestamp) DIV 5,
       name, 
       count(b.name)
FROM time a, id b
WHERE a.user = b.user AND a.id = b.id AND b.name = 'John' 
      AND a.timestamp BETWEEN '2010-11-16 10:30:00' AND '2010-11-16 11:00:00'
GROUP BY year(a.timestamp), 
       month(a.timestamp), 
       hour(a.timestamp), 
       minute(a.timestamp) DIV 12

...and then futz the output in client code to appear the way you like it. ...然后以您喜欢的方式显示客户端代码中的输出。 Or, you can build up the whole date string using the sql concat operatorinstead of getting separate columns, if you like.或者,如果您愿意,您可以使用 sql concat 运算符构建整个日期字符串,而不是获取单独的列。

select concat(year(a.timestamp), "-", month(a.timestamp), "-" ,day(a.timestamp), 
       " " , lpad(hour(a.timestamp),2,'0'), ":", 
       lpad((minute(a.timestamp) DIV 5) * 5, 2, '0'))

...and then group on that ...然后分组

How about this one:这个怎么样:

select 
    from_unixtime(unix_timestamp(timestamp) - unix_timestamp(timestamp) mod 300) as ts,  
    sum(value)
from group_interval 
group by ts 
order by ts
;
select 
CONCAT(CAST(CREATEDATE AS DATE),' ',datepart(hour,createdate),':',ROUNd(CAST((CAST((CAST(DATEPART(MINUTE,CREATEDATE) AS DECIMAL (18,4)))/5 AS INT)) AS DECIMAL (18,4))/12*60,2)) AS '5MINDATE'
,count(something)
from TABLE
group by CONCAT(CAST(CREATEDATE AS DATE),' ',datepart(hour,createdate),':',ROUNd(CAST((CAST((CAST(DATEPART(MINUTE,CREATEDATE) AS DECIMAL (18,4)))/5 AS INT)) AS DECIMAL (18,4))/12*60,2))

This will help exactly what you want这将有助于你想要什么

replace dt - your datetime c - call field astro_transit1 - your table 300 refer 5 min so add 300 each time for time gap increase替换 dt - 您的日期时间 c - 呼叫字段 astro_transit1 - 您的表 300 引用 5 分钟,因此每次添加 300 以增加时间间隔

SELECT FROM_UNIXTIME( 300 * ROUND( UNIX_TIMESTAMP( r.dt ) /300 ) ) AS 5datetime, (
SELECT r.c
FROM astro_transit1 ra
WHERE ra.dt = r.dt
ORDER BY ra.dt DESC
LIMIT 1
) AS first_val FROM astro_transit1 r GROUP BY UNIX_TIMESTAMP( r.dt )
DIV 300
LIMIT 0 , 30

Based on @boecko answer for MySQL, I used a CTE (Common Table Expression) to accelerate the query execution time:基于@boecko 对 MySQL 的回答,我使用了 CTE(公用表表达式)来加快查询执行时间:

so this:所以这:

SELECT
    `timestamp`,
    `name`,
     count(b.`name`)
FROM `time` a, `id` b
WHERE …
GROUP BY 
UNIX_TIMESTAMP(`timestamp`) DIV 300, name  

becomes:变成:

WITH cte AS (
    SELECT
        `timestamp`,
        `name`,
         count(b.`name`),
         UNIX_TIMESTAMP(`timestamp`) DIV 300 AS `intervals`
    FROM `time` a, `id` b
    WHERE …
)
SELECT * FROM cte GROUP BY `intervals`

In a large amount of data, the speed is accelerated by more than 10!在海量数据中,速度提升10多倍!

As timestamp and time are reserved in MySQL, don't forget to use `...` on each table and column name !由于MySQL中保留了timestamptime ,所以不要忘记在每个表和列名上使用`...`!

Hope it will help some of you.希望它会帮助你们中的一些人。

I found out that with MySQL probably the correct query is the following:我发现使用 MySQL 可能正确的查询如下:

SELECT SUBSTRING( FROM_UNIXTIME( CEILING( timestamp /300 ) *300,  
                                 '%Y-%m-%d %H:%i:%S' ) , 1, 19 ) AS ts_CEILING,
SUM(value)
FROM group_interval
GROUP BY SUBSTRING( FROM_UNIXTIME( CEILING( timestamp /300 ) *300,  
                                   '%Y-%m-%d %H:%i:%S' ) , 1, 19 )
ORDER BY SUBSTRING( FROM_UNIXTIME( CEILING( timestamp /300 ) *300,  
                                   '%Y-%m-%d %H:%i:%S' ) , 1, 19 ) DESC

Let me know what you think.让我知道你的想法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM