[英]Group by time with interval in SQL BigQuery
I have the data that I need to group by Time with 2 minutes interval.我有需要按时间以 2 分钟间隔分组的数据。 My data looks like this:
我的数据如下所示:
id time action_name url
111 2020-09-01-09:19:00 First www.stackoverflow/a12345
111 2020-09-01-09:19:04 Midpoint www.stackoverflow/a12345
111 2020-09-01-09:19:08 Third www.stackoverflow/a12345
112 2020-09-01-10:12:05 First www.someotherurl/a111111
111 2020-09-01-12:36:54 First www.stackoverflow/a12345
111 2020-09-01-12:36:58 Midpoint www.stackoverflow/a12345
111 2020-09-01-12:37:03 Third www.stackoverflow/a12345
111 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
222 2020-09-01-15:17:44 First www.stackoverflow/a2222
222 2020-09-01-15:17:48 Midpoint www.stackoverflow/a2222
222 2020-09-01-15:18:05 Third www.stackoverflow/a2222
I need to grab the data with the following condition: if x_id
and x_url
has Complete
value for action_name
column, grab that.我需要在以下的条件来获取数据:如果
x_id
和x_url
具有Complete
的价值action_name
列,抢。 If it doesn't have Complete
then grab Third
and so on.如果它没有
Complete
则抓住Third
等。 The code that I have at the moment returns only one row per x_id
and x_url
.我目前拥有的代码每个
x_id
和x_url
仅返回一行。 So not only I need to group the data by id
and url
but also by time, with interval of 2 minties.因此,我不仅需要按
id
和url
对数据进行分组,还需要按时间对数据进行分组,间隔为 2 分钟。 Below is the code:下面是代码:
SELECT AS VALUE
ARRAY_AGG(current_query_result
ORDER BY CASE action_name
WHEN 'Complete' THEN 1
WHEN 'Third' THEN 2
WHEN 'Midpoint' THEN 3
WHEN 'First' THEN 4
END
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT
c.time,
c.id,
c.action_name,
c.url
FROM `bq_table` c
WHERE c.action_name in ('First', 'Midpoint', 'Third', 'Complete')
) current_query_result
GROUP BY id, url
Desired output is:期望的输出是:
id time action_name url
111 2020-09-01-09:19:08 Third www.stackoverflow/a12345
112 2020-09-01-10:12:05 First www.someotherurl/a111111
111 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
222 2020-09-01-15:18:05 Third www.stackoverflow/a2222
I have tried this: TIMESTAMP_SECONDS(2*60 * DIV(UNIX_SECONDS(c.time), 2*60)) timekey
but got an error: No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)
我试过这个:
TIMESTAMP_SECONDS(2*60 * DIV(UNIX_SECONDS(c.time), 2*60)) timekey
但得到一个错误: No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)
No matching signature for function UNIX_SECONDS for argument types: STRING. Supported signature: UNIX_SECONDS(TIMESTAMP)
Below is for BigQuery Standard SQL下面是 BigQuery 标准 SQL
#standardSQL
SELECT
AS VALUE ARRAY_AGG(t
ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC
LIMIT 1
)[OFFSET(0)]
FROM `project.dataset.bq_table` t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url,
TIMESTAMP_SUB(
PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time),
INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60)
SECOND
)
You can test, play with above using sample data from your question as in below example您可以使用您的问题中的示例数据进行测试,使用上面的示例数据,如下例所示
#standardSQL
WITH `project.dataset.bq_table` AS (
SELECT 111 id, '2020-09-01-09:19:00' time, 'First' action_name, 'www.stackoverflow/a12345' url UNION ALL
SELECT 111, '2020-09-01-09:19:04', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-09:19:08', 'Third', 'www.stackoverflow/a12345' UNION ALL
SELECT 112, '2020-09-01-10:12:05', 'First', 'www.someotherurl/a111111' UNION ALL
SELECT 111, '2020-09-01-12:36:54', 'First', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:36:58', 'Midpoint', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:37:03', 'Third', 'www.stackoverflow/a12345' UNION ALL
SELECT 111, '2020-09-01-12:37:09', 'Complete', 'www.stackoverflow/a12345'
)
SELECT
AS VALUE ARRAY_AGG(t
ORDER BY STRPOS('First,Midpoint,Third,Complete',action_name) DESC
LIMIT 1
)[OFFSET(0)]
FROM `project.dataset.bq_table` t
WHERE action_name IN ('First', 'Midpoint', 'Third', 'Complete')
GROUP BY id, url,
TIMESTAMP_SUB(
PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time),
INTERVAL MOD(UNIX_SECONDS(PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', time)), 2 * 60)
SECOND
)
with output带输出
Row id time action_name url
1 111 2020-09-01-09:19:08 Third www.stackoverflow/a12345
2 112 2020-09-01-10:12:05 First www.someotherurl/a111111
3 111 2020-09-01-12:37:09 Complete www.stackoverflow/a12345
I think you are very close to solve it, you just need to use PARSE_TIMESTAMP to convert the string into TIMESTAMP type, eg我认为你已经很接近解决它了,你只需要使用 PARSE_TIMESTAMP 将字符串转换为 TIMESTAMP 类型,例如
SELECT PARSE_TIMESTAMP('%Y-%m-%d-%H:%M:%S', '2020-09-01-09:19:00')
Outputs:输出:
+---------------------+
| f0_ |
+---------------------+
| 2020-09-01 09:19:00 |
+---------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.