簡體   English   中英

如何使用 Snowflake Javascript 存儲過程或 Function 遍歷表中的所有列?

[英]How to iterate over all columns in a table using Snowflake Javascript Stored Procedure or Function?

我在 Snowflake 中有一個包含 100 多列的表,我試圖計算每列中所有不同值的數量,並最終將每列的所有計數連接到一個表中。 如果我只在一列上做它會是這樣的:

SELECT DISTINCT "AGE", count(*) AS "Frequency"
FROM 
    db.schema.tablename
WHERE 
    "SURVEYDATE" < "2019-07-29"
GROUP BY
    AGE;

我知道這在 Python 中做起來有點微不足道(也許我應該在 PySpark 中做,我願意接受建議),但我認為這對我的團隊來說很容易使用,而且在 3 億上做起來更快行,我想使用 Snowflake Javascript 過程語言來做類似的事情:

create or replace procedure column_counts(table)
returns array
language javascript
as
$$
var num_columns = //get number of columns
var columns = [list of columns]
var results_array = [];

for (i = 0; i < num_columns; i++) {
    var col_count = snowflake.createStatement( {sqlText: 'SELECT DISTINCT columns[i], count(*) AS "Frequency" FROM 
    db.schema.tablename WHERE "SURVEYDATE" < "2019-07-29" GROUP BY columns[i]' }).execute(); //This returns a table of all distinct values in that column and their counts
    results_array.push([columns[i], col_count]) //I then want an array like [column_name[0...i], distinct_value[0....n], frequency]
    return results_array;
$$
;
CALL column_counts();

我對在 Snowflake 和整個 Snowflake 中使用這種過程語言還是很陌生,所以絕對願意接受有關如何最好地做到這一點的建議,並以可重復的方式為每個月出現的新表提供建議。

沒有任何類型的程序代碼是可能的。 例如使用 JSON:

WITH cte AS ( -- here goes the table/query/view
  SELECT TOP 100 OBJECT_CONSTRUCT(*) AS json_payload
  FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
)
SELECT f.KEY, 
      COUNT(DISTINCT f."VALUE") AS frequency, 
      LISTAGG(DISTINCT  f."VALUE" ,',') AS distinct_values  -- debug
FROM cte
, LATERAL FLATTEN (input => json_payload) f
-- WHERE f.KEY IN ('column_name1', 'column_name2', ...) -- only specific columns
GROUP BY f.KEY;

Output:

+-----------------+-----------+------------------------------------------------+
|       KEY       | FREQUENCY |                DISTINCT_VALUES                 |
+-----------------+-----------+------------------------------------------------+
| O_ORDERPRIORITY |         5 | 2-HIGH,1-URGENT,5-LOW,4-NOT SPECIFIED,3-MEDIUM |
| O_ORDERSTATUS   |         3 | P,O,F                                          |
| O_SHIPPRIORITY  |         1 | 0                                              |
| ...             |       ... | ....                                           |
+-----------------+-----------+------------------------------------------------+

怎么運行的:

  1. 使用OBJECT_CONSTRUCT(*)每行生成 JSON

  2. 將 JSON 展平為鍵/值

  3. 按鍵分組並應用特定聚合 function COUNT/COUNT(DISTINCT )/LISTAGG/MIN/MAX/...


提供每列/值分布的版本:

WITH cte AS (
  SELECT TOP 100 OBJECT_CONSTRUCT(*) AS json_payload
  FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS
)
SELECT f.KEY, f."VALUE", COUNT(*) AS frequency
FROM cte
, LATERAL FLATTEN (input => json_payload) f
-- WHERE f.KEY IN ('column_name1', 'column_name2', ...) -- only specific columns
GROUP BY f.KEY, f."VALUE"
ORDER BY f.KEY, f."VALUE";

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM