繁体   English   中英

如何在 BigQuery 中跨多列(动态)获取前 3 列及其值

[英]How to get top 3 columns and their values across multiple columns (dynamically) in BigQuery

我有一张看起来像这样的桌子

select 'Alice' AS ID, 1 as col1, 3 as col2, -2 as col3, 9 as col4
union all
select 'Bob' AS ID, -9 as col1, 2 as col2, 5 as col3, -6 as col4

我想获得四列中每条记录的前 3 个绝对值,然后将 output 格式化为字典或 STRUCT,如下所示

select 
'Alice' AS ID, [STRUCT('col4' AS column, 9 AS value), STRUCT('col2',3), STRUCT('col3',-2)] output
union all
select
'Bob' AS ID, [STRUCT('col1' AS column, -9 AS value), STRUCT('col4',-6), STRUCT('col3',5)]
 output

output

我希望它是动态的,所以避免单独写出列。 它最多可以更改 go 100 列

有关更多上下文,我试图从 Vertex AI output 中的批处理本地解释中获取前三个功能https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-预测

我查了一些例子,想要类似于这里的第二个答案的东西如何获取记录中列值的最大值? (大查询)

编辑:数据实际上是这样构造的。 如果可以更轻松地使用它,那么这将是一个更好的选择

select 'Alice' AS ID,  STRUCT(1 as col1, 3 as col2, -2 as col3, 9 as col4) AS featureAttributions
union all
SELECT  'Bob' AS ID, STRUCT(-9 as col1, 2 as col2, 5 as col3, -6 as col4) AS featureAttributions

考虑以下查询。

SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
  FROM (
    SELECT * FROM sample_table UNPIVOT (value FOR column IN (col1, col2, col3, col4))
  ) 
 GROUP BY ID;

查询结果

在此处输入图像描述

动态查询

我希望它是动态的,所以避免单独写出列

为此,您需要考虑动态 SQL。 通过参考您在帖子中链接的@Mikhail 的回答,您可以编写如下所示的动态查询。

EXECUTE IMMEDIATE FORMAT("""
  SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
    FROM (
      SELECT * FROM sample_table UNPIVOT (value FOR column IN (%s))
    ) 
   GROUP BY ID
""", ARRAY_TO_STRING(
  REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT (ID) FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);

对于更新的示例表

SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
  FROM (
    SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table) 
   UNPIVOT (value FOR column IN (col1, col2, col3, col4))
  ) 
 GROUP BY ID;
EXECUTE IMMEDIATE FORMAT("""
  SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
    FROM (
      SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
     UNPIVOT (value FOR column IN (%s))
    ) 
   GROUP BY ID
""", ARRAY_TO_STRING(
  REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT featureAttributions FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM