簡體   English   中英

在 BigQuery 中,如何分別對每列的值進行排序

[英]In BigQuery, how to sort values separately for each column

是否可以在 bigquery 的表中分別對每一列進行排序? 我們有一個統計表,我們想對分區內的每一列分別進行排序。 我們有 3 個統計數據stat1stat2stat3gender是我們的分區鍵:

with
  stats as (
    select 'male' as gender, .60 as stat1, 23 as stat2, .10 as stat3 union all
    select 'male' as gender, .62 as stat1, 28 as stat2, .12 as stat3 union all
    select 'male' as gender, .57 as stat1, 21 as stat2, .16 as stat3 union all
    select 'male' as gender, .51 as stat1, 18 as stat2, .14 as stat3 union all
    select 'male' as gender, .53 as stat1, 17 as stat2, .18 as stat3 union all
    select 'male' as gender, .46 as stat1, 31 as stat2, .08 as stat3 union all
    select 'male' as gender, .49 as stat1, 32 as stat2, .07 as stat3 union all
    select 'male' as gender, .55 as stat1, 40 as stat2, .23 as stat3 union all
    select 'male' as gender, .68 as stat1, 41 as stat2, .33 as stat3 union all
    select 'male' as gender, .56 as stat1, 36 as stat2, .32 as stat3 union all
    select 'female' as gender, .80 as stat1, 32 as stat2, .42 as stat3 union all
    select 'female' as gender, .82 as stat1, 24 as stat2, .43 as stat3 union all
    select 'female' as gender, .73 as stat1, 26 as stat2, .33 as stat3 union all
    select 'female' as gender, .85 as stat1, 27 as stat2, .55 as stat3 union all
    select 'female' as gender, .91 as stat1, 29 as stat2, .53 as stat3 union all
    select 'female' as gender, .88 as stat1, 13 as stat2, .51 as stat3 union all
    select 'female' as gender, .86 as stat1, 38 as stat2, .49 as stat3 union all
    select 'female' as gender, .77 as stat1, 35 as stat2, .40 as stat3 union all
    select 'female' as gender, .74 as stat1, 15 as stat2, .58 as stat3 union all
    select 'female' as gender, .95 as stat1, 17 as stat2, .59 as stat3
  ),

  -- we create rank columns
  stats_with_ranks as (
    select
      *
      ,row_number() over (partition by gender) as rank
      ,rank() over (partition by gender order by stat1 desc) as stat1rk
      ,rank() over (partition by gender order by stat2 desc) as stat2rk
      ,rank() over (partition by gender order by stat3 desc) as stat3rk
    from stats
  )

select * from stats_with_ranks where gender = 'female' order by rank asc

在此處輸入圖像描述

這不是我們問題的正確表格。 我們希望每個指標都在其自己的列中排序,以便其值對應於單一rank列。 例如,每個stat1stat2stat3的最大值將與rank = 1在同一行。 是這樣的:

在此處輸入圖像描述

我們想出了以下方法:

select
  a.gender
  ,a.rank
  ,b.stat1 as stat1
  ,c.stat2 as stat2
  ,d.stat3 as stat3
from stats_with_ranks as a
left join stats_with_ranks as b on a.gender = b.gender and a.rank = b.stat1rk
left join stats_with_ranks as c on a.gender = c.gender and a.rank = c.stat2rk
left join stats_with_ranks as d on a.gender = d.gender and a.rank = d.stat3rk
order by gender asc, rank asc

然而,這種方法不適用於我們更大的數據集,我們正在排名的表中有 200 多個統計數據。 我們很快就得到了Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. 當這些左連接有 200 多個時。 這就是為什么我們正在尋找一種在列內排序的解決方案。

下面的答案假設所有統計列都具有相同的數據類型並且具有與 statN 一樣的命名約定

execute immediate (select '''
  select * from (
    select *, row_number() over(partition by gender, col order by stat desc) rank
    from stats
    unpivot (stat for col in (''' || col_list || ''')) 
  )
  pivot (any_value(stat) for col in (''' || val_list || '''))
  order by gender, rank
  '''
  from (
    select 
      string_agg('"stat' || pos || '"') val_list, 
      string_agg('stat' || pos) col_list
    from unnest(generate_array(1,3)) pos
  )
)          

如果應用於您問題中的示例數據(所有 float64 值)

在此處輸入圖像描述

只要不超過查詢資源,我們不介意列出所有 200 多個統計信息......

create temp function  extract_keys(input string) returns array<string> language js as "return Object.keys(JSON.parse(input));";
create temp function  extract_values(input string) returns array<string> language js as "return Object.values(JSON.parse(input));";
select * from (
  select gender, col, cast(stat as float64) stat, row_number() over(partition by gender, col order by cast(stat as float64) desc) rank
  from stats t, 
  unnest([struct(to_json_string((select as struct * except(gender) from unnest([t]))) as json)]),
  unnest(extract_keys(json)) col with offset
  join unnest(extract_values(json)) stat with offset
  using(offset)
)
pivot (any_value(stat) for col in ('stat1','stat2','stat3'))    

如果應用於您問題中的示例數據 - output 是

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM