简体   繁体   English

在 SQL BigQuery 中按时间间隔分组

[英]Group by time with interval in SQL BigQuery

I have the data that I need to group by Time with 2 minutes interval.我有需要按时间以 2 分钟间隔分组的数据。 My data looks like this:我的数据如下所示:

id            time             action_name            url
111      2020-09-01-09:19:00     First           www.stackoverflow/a12345
111      2020-09-01-09:19:04     Midpoint        www.stackoverflow/a12345
111      2020-09-01-09:19:08     Third           www.stackoverflow/a12345
112      2020-09-01-10:12:05     First           www.someotherurl/a111111
111      2020-09-01-12:36:54     First           www.stackoverflow/a12345
111      2020-09-01-12:36:58     Midpoint        www.stackoverflow/a12345
111      2020-09-01-12:37:03     Third           www.stackoverflow/a12345
111      2020-09-01-12:37:09     Complete        www.stackoverflow/a12345
222      2020-09-01-15:17:44     First           www.stackoverflow/a2222
222      2020-09-01-15:17:48     Midpoint        www.stackoverflow/a2222
222      2020-09-01-15:18:05     Third           www.stackoverflow/a2222

I need to grab the data with the following condition: if x_id and x_url has Complete value for action_name column, grab that.我需要在以下的条件来获取数据:如果x_idx_url具有Complete的价值action_name列,抢。 If it doesn't have Complete then grab Third and so on.如果它没有Complete则抓住Third等。 The code that I have at the moment returns only one row per x_id and x_url .我目前拥有的代码每个x_idx_url仅返回一行。 So not only I need to group the data by id and url but also by time, with interval of 2 minties.因此,我不仅需要按idurl对数据进行分组,还需要按时间对数据进行分组,间隔为 2 分钟。 Below is the code:下面是代码:

SELECT AS VALUE 
  ARRAY_AGG(current_query_result 
    ORDER BY CASE action_name
      WHEN 'Complete' THEN 1
      WHEN 'Third' THEN 2
      WHEN 'Midpoint' THEN 3
      WHEN 'First' THEN 4
    END
    LIMIT 1
  )[OFFSET(0)] 
FROM (
  SELECT
    c.time,
    c.id,
    c.action_name, 
    c.url
  FROM `bq_table` c
  WHERE c.action_name in ('First', 'Midpoint', 'Third', 'Complete')
) current_query_result
GROUP BY id, url

Desired output is:期望的输出是:

id            time             action_name            url
111      2020-09-01-09:19:08     Third           www.stackoverflow/a12345
112      2020-09-01-10:12:05     First           www.someotherurl/a111111
111      2020-09-01-12:37:09     Complete        www.stackoverflow/a12345
222      2020-09-01-15:18:05     Third           www.stackoverflow/a2222

I have tried this: TIMESTAMP_SECONDS(2*60 * DIV(UNIX_SECONDS(c.time), 2*60)) timekey but got an error: No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)我试过这个: TIMESTAMP_SECONDS(2*60 * DIV(UNIX_SECONDS(c.time), 2*60)) timekey但得到一个错误: No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP) No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)

Below is for BigQuery Standard SQL下面是 BigQuery 标准 SQL

#standardSQL
SELECT 
  AS VALUE ARRAY_AGG(t 
    ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC 
    LIMIT 1
  )[OFFSET(0)]
FROM `project.dataset.bq_table` t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url, 
  TIMESTAMP_SUB(
    PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time), 
    INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60) 
    SECOND
  )   

You can test, play with above using sample data from your question as in below example您可以使用您的问题中的示例数据进行测试,使用上面的示例数据,如下例所示

#standardSQL
WITH `project.dataset.bq_table` AS (
  SELECT 111 id, '2020-09-01-09:19:00' time, 'First' action_name, 'www.stackoverflow/a12345' url UNION ALL
  SELECT 111, '2020-09-01-09:19:04', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
  SELECT 111, '2020-09-01-09:19:08', 'Third', 'www.stackoverflow/a12345' UNION ALL
  SELECT 112, '2020-09-01-10:12:05', 'First', 'www.someotherurl/a111111' UNION ALL
  SELECT 111, '2020-09-01-12:36:54', 'First', 'www.stackoverflow/a12345' UNION ALL
  SELECT 111, '2020-09-01-12:36:58', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
  SELECT 111, '2020-09-01-12:37:03', 'Third', 'www.stackoverflow/a12345' UNION ALL
  SELECT 111, '2020-09-01-12:37:09', 'Complete', 'www.stackoverflow/a12345' 
)
SELECT 
  AS VALUE ARRAY_AGG(t 
    ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC 
    LIMIT 1
  )[OFFSET(0)]
FROM `project.dataset.bq_table` t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url, 
  TIMESTAMP_SUB(
    PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time), 
    INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60) 
    SECOND
  )   

with output带输出

Row     id      time                    action_name     url  
1       111     2020-09-01-09:19:08     Third           www.stackoverflow/a12345     
2       112     2020-09-01-10:12:05     First           www.someotherurl/a111111     
3       111     2020-09-01-12:37:09     Complete        www.stackoverflow/a12345    

I think you are very close to solve it, you just need to use PARSE_TIMESTAMP to convert the string into TIMESTAMP type, eg我认为你已经很接近解决它了,你只需要使用 PARSE_TIMESTAMP 将字符串转换为 TIMESTAMP 类型,例如

SELECT PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', '2020-09-01-09:19:00')

Outputs:输出:

+---------------------+
|         f0_         |
+---------------------+
| 2020-09-01 09:19:00 |
+---------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM