如何计算蜂巢0-1序列的时间长度？

Question

Now I have a data like: 现在我有一个类似的数据：

time(string) id(int)

201801051127 0

201801051130 0

201801051132 0

201801051135 1

201801051141 1

201801051145 0

201801051147 0

It has three different parts, and I want to calculate the time length of these three parts, such as the first zero sequence, the time length is 5 minutes. 它具有三个不同的部分，我想计算这三个部分的时间长度，例如第一个零序列，时间长度为5分钟。 If I use 'group by 0 and 1', the first zero sequence would combine with the third zero sequence, which is not what I want. 如果我使用“ 0和1分组”，则第一个零序列将与第三个零序列合并，这不是我想要的。 How I calculate the three parts' length with sql? 如何使用sql计算三部分的长度？ My tried my-sql code is as follows: 我尝试过的my-sql代码如下：

SET @id_label:=0;
SELECT id_label,id,TIMESTAMPDIFF(MINUTE,MIN(DATE1),MAX(DATE1)) FROM
(SELECT id, DATE1, id_label FROM (
SELECT id, str_to_date ( TIME,'%Y%m%d%H%i' ) DATE1,
@id_label := IF(@id = id, @id_label, @id_label+1)  id_label,
@id := id
FROM test.t
ORDER BY str_to_date ( TIME,'%Y%m%d%h%i' )
) a)b
GROUP BY id_label,id;

I don't know how to change it into hive code. 我不知道如何将其更改为蜂巢代码。

Answer 1

Try This. 尝试这个。

SELECT id, ( max( TO_DATE ( time,'YYYYMMDDHHMI' ) )
- min( TO_DATE ( time,'YYYYMMDDHHMI' ) ) ) *24*60 diff_in_minutes from 
(
select t.*,
row_number()   OVER ( ORDER BY 
                    TO_DATE ( time,'YYYYMMDDHHMI' ) )
- row_number() OVER ( PARTITION BY ID ORDER BY 
                    TO_DATE ( time,'YYYYMMDDHHMI' ) ) seq
FROM Table1 t ORDER BY time
  ) GROUP BY ID,seq
  ORDER BY max(time)
  ;

DEMO DEMO

EDIT: This answer was written considering that the OP had tagged oracle .Now it is changed to hive . 编辑：考虑到OP已标记了oracle ，因此编写了此答案。现在将其更改为hive 。

As an alternative in hive for TO_DATE in Oracle, 作为蜂巢中Oracle的TO_DATE的替代方案，

unix_timestamp(time, 'yyyyMMddhhmm')

could be used. 可用于。

Answer 2

I would suggest some transformations: 我建议进行一些转换：

add an indication whether a row is the first one in its group (flag as 1, or null otherwise) 添加指示行是否是其组中的第一行（标志为1，否则为null）
count the number of such flags that precede a row to know its group number 计算行前面的此类标志的数目以知道其组号

Then you can just group by that new group number. 然后，您可以按该新组号进行分组。

Oracle version (original question) Oracle版本（原始问题）

with q1 as (
    select to_date(time, 'YYYYMMDDHH24MI') time, id, 
           case id when lag(id) over(order by time) then null else 1 end first_in_group 
    from t
), q2 as (
    select time, id, count(first_in_group) over (order by time) grp_id
    from   q1
)
select   min(id) id, (max(time) - min(time)) * 24 * 60 minutes
from     q2
group by grp_id
order by grp_id

SQL fiddle SQL小提琴

Hive version 蜂巢版

Different database engines use different functions to deal with date/time values, so use Hive's unix_timestamp and deal with the number of seconds it returns: 不同的数据库引擎使用不同的函数来处理日期/时间值，因此使用Hive的unix_timestamp并处理其返回的秒数：

with q1 as (
    select unix_timestamp(time, 'yyyyMMddHHmm')/60 time, id, 
           case id when lag(id) over(order by time) then null else 1 end first_in_group 
    from t
), q2 as (
    select time, id, count(first_in_group) over (order by time) grp_id
    from   q1
)
select   min(id) id, max(time) - min(time) minutes
from     q2
group by grp_id
order by grp_id

如何计算蜂巢0-1序列的时间长度？

问题描述

2 个解决方案

解决方案1
1 2018-01-07 08:27:51

解决方案2
1 已采纳 2018-01-07 08:51:19

Oracle version (original question) Oracle版本（原始问题）

Hive version 蜂巢版

如何计算蜂巢0-1序列的时间长度？

问题描述

2 个解决方案

解决方案1 1 2018-01-07 08:27:51

解决方案2 1 已采纳 2018-01-07 08:51:19

Oracle version (original question) Oracle版本（原始问题）

Hive version 蜂巢版

解决方案1
1 2018-01-07 08:27:51

解决方案2
1 已采纳 2018-01-07 08:51:19