[英]STRING_AGG in Bigquery
我对 Bigquery 中的 STRING_AGG 有疑问。 我正在努力:
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) as count_intitution
STRING_AGG(DISTINCT institution,"," ) OVER (PARTITION BY id) as list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
我得到这个错误:
解析 function string_agg 不支持 DISTINCT。
BQ 文档说它允许使用“DISTINCT”
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg
但显然它不支持“分区依据”,为什么?
编辑:
当前表是这样的(是一个例子,表的属性比较多)
|id |institution|
|1 | a |
|1 | b |
|2 | a |
|2 | c |
|3 | a |
|1 | a |
我想要实现的是
|id|count_institution|list_institution|
|1 |2 |a,b |
|2 |2 |a,c |
|3 |1 |a |
以下是 BigQuery 标准 SQL
#standardSQL
SELECT *
REPLACE((
SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
) AS list_intitution
)
FROM (
SELECT
id,
institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) AS count_intitution,
ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
FROM
name_table
WHERE
DATE(created_at) = "2020-02-02"
) t
注意:在您的原始查询中,您只需删除 DISTINCT 并使用 ARRAY_AGG 而不是 STRING_AGG,但随后在外部查询中您处理此数组以形成该数组中不同值的列表
以下是您更新问题的答案
您可以简单地使用 GROUP BY,如下例所示
#standardSQL
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
如果适用于您的问题的样本数据,如下例所示
#standardSQL
WITH name_table AS (
SELECT 1 id, 'a' institution UNION ALL
SELECT 1, 'b' UNION ALL
SELECT 2, 'a' UNION ALL
SELECT 2, 'c' UNION ALL
SELECT 3, 'a' UNION ALL
SELECT 1, 'a'
)
SELECT id,
COUNT(DISTINCT institution) AS count_institution,
STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id
结果是
Row id count_institution list_institution
1 1 2 a,b
2 2 2 a,c
3 3 1 a
您可以轻松解决此问题:
SELECT id, institution,
COUNT(DISTINCT institution) OVER (PARTITION BY id) as list_intitution
STRING_AGG(CASE WHEN seqnum = 1 THEN institution END, ',') OVER (PARTITION BY id) as list_intitution
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM name_table
WHERE DATE(created_at) = '2020-02-02'
) t
根据您更新的问题进行了更新。 您可以根本不使用window functions
。
with cte1 as
(select distinct id, institution
from name_table
where date(created_at) = "2020-02-02")
select id, count(institution) count_inst, string_agg(institution,"," ) list_inst
from cte1
group by id;
输出
+----+------------+-----------+
| id | count_inst | list_inst |
+----+------------+-----------+
| 1 | 2 | a,b |
| 2 | 2 | a,c |
| 3 | 1 | a |
+----+------------+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.