在 BigQuery 中，如何分別對每列的值進行排序

Question

是否可以在 bigquery 的表中分別對每一列進行排序？ 我們有一個統計表，我們想對分區內的每一列分別進行排序。 我們有 3 個統計數據stat1 、 stat2 、 stat3和gender是我們的分區鍵：

with
  stats as (
    select 'male' as gender, .60 as stat1, 23 as stat2, .10 as stat3 union all
    select 'male' as gender, .62 as stat1, 28 as stat2, .12 as stat3 union all
    select 'male' as gender, .57 as stat1, 21 as stat2, .16 as stat3 union all
    select 'male' as gender, .51 as stat1, 18 as stat2, .14 as stat3 union all
    select 'male' as gender, .53 as stat1, 17 as stat2, .18 as stat3 union all
    select 'male' as gender, .46 as stat1, 31 as stat2, .08 as stat3 union all
    select 'male' as gender, .49 as stat1, 32 as stat2, .07 as stat3 union all
    select 'male' as gender, .55 as stat1, 40 as stat2, .23 as stat3 union all
    select 'male' as gender, .68 as stat1, 41 as stat2, .33 as stat3 union all
    select 'male' as gender, .56 as stat1, 36 as stat2, .32 as stat3 union all
    select 'female' as gender, .80 as stat1, 32 as stat2, .42 as stat3 union all
    select 'female' as gender, .82 as stat1, 24 as stat2, .43 as stat3 union all
    select 'female' as gender, .73 as stat1, 26 as stat2, .33 as stat3 union all
    select 'female' as gender, .85 as stat1, 27 as stat2, .55 as stat3 union all
    select 'female' as gender, .91 as stat1, 29 as stat2, .53 as stat3 union all
    select 'female' as gender, .88 as stat1, 13 as stat2, .51 as stat3 union all
    select 'female' as gender, .86 as stat1, 38 as stat2, .49 as stat3 union all
    select 'female' as gender, .77 as stat1, 35 as stat2, .40 as stat3 union all
    select 'female' as gender, .74 as stat1, 15 as stat2, .58 as stat3 union all
    select 'female' as gender, .95 as stat1, 17 as stat2, .59 as stat3
  ),

  -- we create rank columns
  stats_with_ranks as (
    select
      *
      ,row_number() over (partition by gender) as rank
      ,rank() over (partition by gender order by stat1 desc) as stat1rk
      ,rank() over (partition by gender order by stat2 desc) as stat2rk
      ,rank() over (partition by gender order by stat3 desc) as stat3rk
    from stats
  )

select * from stats_with_ranks where gender = 'female' order by rank asc

這不是我們問題的正確表格。 我們希望每個指標都在其自己的列中排序，以便其值對應於單一rank列。 例如，每個stat1 、 stat2 、 stat3的最大值將與rank = 1在同一行。 是這樣的：

我們想出了以下方法：

select
  a.gender
  ,a.rank
  ,b.stat1 as stat1
  ,c.stat2 as stat2
  ,d.stat3 as stat3
from stats_with_ranks as a
left join stats_with_ranks as b on a.gender = b.gender and a.rank = b.stat1rk
left join stats_with_ranks as c on a.gender = c.gender and a.rank = c.stat2rk
left join stats_with_ranks as d on a.gender = d.gender and a.rank = d.stat3rk
order by gender asc, rank asc

然而，這種方法不適用於我們更大的數據集，我們正在排名的表中有 200 多個統計數據。 我們很快就得到了Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. 當這些左連接有 200 多個時。 這就是為什么我們正在尋找一種在列內排序的解決方案。

Answer 1

下面的答案假設所有統計列都具有相同的數據類型並且具有與 statN 一樣的命名約定

execute immediate (select '''
  select * from (
    select *, row_number() over(partition by gender, col order by stat desc) rank
    from stats
    unpivot (stat for col in (''' || col_list || ''')) 
  )
  pivot (any_value(stat) for col in (''' || val_list || '''))
  order by gender, rank
  '''
  from (
    select 
      string_agg('"stat' || pos || '"') val_list, 
      string_agg('stat' || pos) col_list
    from unnest(generate_array(1,3)) pos
  )
)

如果應用於您問題中的示例數據（所有 float64 值）

Answer 2

只要不超過查詢資源，我們不介意列出所有 200 多個統計信息......

create temp function  extract_keys(input string) returns array<string> language js as "return Object.keys(JSON.parse(input));";
create temp function  extract_values(input string) returns array<string> language js as "return Object.values(JSON.parse(input));";
select * from (
  select gender, col, cast(stat as float64) stat, row_number() over(partition by gender, col order by cast(stat as float64) desc) rank
  from stats t, 
  unnest([struct(to_json_string((select as struct * except(gender) from unnest([t]))) as json)]),
  unnest(extract_keys(json)) col with offset
  join unnest(extract_values(json)) stat with offset
  using(offset)
)
pivot (any_value(stat) for col in ('stat1','stat2','stat3'))

如果應用於您問題中的示例數據 - output 是

在 BigQuery 中，如何分別對每列的值進行排序

問題描述

2 個解決方案

解決方案1
2 2022-05-09 19:17:32

解決方案2
1 已采納 2022-05-09 21:06:43

在 BigQuery 中，如何分別對每列的值進行排序

問題描述

2 個解決方案

解決方案1 2 2022-05-09 19:17:32

解決方案2 1 已采納 2022-05-09 21:06:43

解決方案1
2 2022-05-09 19:17:32

解決方案2
1 已采納 2022-05-09 21:06:43