繁体   English   中英

Bigquery 中的 STRING_AGG

[英]STRING_AGG in Bigquery

我对 Bigquery 中的 STRING_AGG 有疑问。 我正在努力:

SELECT
 id,
 institution,
 COUNT(DISTINCT institution)  OVER (PARTITION BY id) as count_intitution
 STRING_AGG(DISTINCT institution,"," )  OVER (PARTITION BY id) as list_intitution
FROM
 name_table
WHERE
 DATE(created_at) = "2020-02-02"

我得到这个错误:

解析 function string_agg 不支持 DISTINCT。

BQ 文档说它允许使用“DISTINCT”

https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg

但显然它不支持“分区依据”,为什么?

编辑:

当前表是这样的(是一个例子,表的属性比较多)

|id |institution|
|1  | a         |
|1  | b         |
|2  | a         |
|2  | c         |
|3  | a         |
|1  | a         |

我想要实现的是

|id|count_institution|list_institution|
|1 |2                |a,b             |
|2 |2                |a,c             |
|3 |1                |a               |

以下是 BigQuery 标准 SQL

#standardSQL
SELECT * 
  REPLACE((
      SELECT STRING_AGG(DISTINCT i) FROM t.list_intitution i
    ) AS list_intitution
  ) 
FROM (
  SELECT
   id,
   institution,
   COUNT(DISTINCT institution)  OVER (PARTITION BY id) AS count_intitution,
   ARRAY_AGG(institution) OVER (PARTITION BY id) AS list_intitution
  FROM
   name_table
  WHERE
   DATE(created_at) = "2020-02-02"
) t  

注意:在您的原始查询中,您只需删除 DISTINCT 并使用 ARRAY_AGG 而不是 STRING_AGG,但随后在外部查询中您处理此数组以形成该数组中不同值的列表

以下是您更新问题的答案

您可以简单地使用 GROUP BY,如下例所示

#standardSQL
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id

如果适用于您的问题的样本数据,如下例所示

#standardSQL
WITH name_table AS (
  SELECT 1 id, 'a' institution UNION ALL
  SELECT 1, 'b' UNION ALL
  SELECT 2, 'a' UNION ALL
  SELECT 2, 'c' UNION ALL
  SELECT 3, 'a' UNION ALL
  SELECT 1, 'a' 
)
SELECT id, 
  COUNT(DISTINCT institution) AS count_institution,
  STRING_AGG(DISTINCT institution) AS list_institution
FROM name_table
GROUP BY id

结果是

Row id  count_institution   list_institution     
1   1   2                   a,b  
2   2   2                   a,c  
3   3   1                   a    

您可以轻松解决此问题:

SELECT id, institution,
       COUNT(DISTINCT institution)  OVER (PARTITION BY id) as list_intitution
       STRING_AGG(CASE WHEN seqnum = 1 THEN institution END, ',')  OVER (PARTITION BY id) as list_intitution
FROM (SELECT t.*, 
             ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
      FROM name_table
      WHERE DATE(created_at) = '2020-02-02'
     ) t

根据您更新的问题进行了更新。 您可以根本不使用window functions

with cte1 as
(select distinct id, institution
from name_table
where date(created_at) = "2020-02-02")

select id, count(institution) count_inst, string_agg(institution,"," ) list_inst
from cte1 
group by id;

输出

+----+------------+-----------+
| id | count_inst | list_inst |
+----+------------+-----------+
|  1 |          2 | a,b       |
|  2 |          2 | a,c       |
|  3 |          1 | a         |
+----+------------+-----------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM