[英]In BigQuery, how to sort values separately for each column
是否可以在 bigquery 的表中分別對每一列進行排序? 我們有一個統計表,我們想對分區內的每一列分別進行排序。 我們有 3 個統計數據stat1
、 stat2
、 stat3
和gender
是我們的分區鍵:
with
stats as (
select 'male' as gender, .60 as stat1, 23 as stat2, .10 as stat3 union all
select 'male' as gender, .62 as stat1, 28 as stat2, .12 as stat3 union all
select 'male' as gender, .57 as stat1, 21 as stat2, .16 as stat3 union all
select 'male' as gender, .51 as stat1, 18 as stat2, .14 as stat3 union all
select 'male' as gender, .53 as stat1, 17 as stat2, .18 as stat3 union all
select 'male' as gender, .46 as stat1, 31 as stat2, .08 as stat3 union all
select 'male' as gender, .49 as stat1, 32 as stat2, .07 as stat3 union all
select 'male' as gender, .55 as stat1, 40 as stat2, .23 as stat3 union all
select 'male' as gender, .68 as stat1, 41 as stat2, .33 as stat3 union all
select 'male' as gender, .56 as stat1, 36 as stat2, .32 as stat3 union all
select 'female' as gender, .80 as stat1, 32 as stat2, .42 as stat3 union all
select 'female' as gender, .82 as stat1, 24 as stat2, .43 as stat3 union all
select 'female' as gender, .73 as stat1, 26 as stat2, .33 as stat3 union all
select 'female' as gender, .85 as stat1, 27 as stat2, .55 as stat3 union all
select 'female' as gender, .91 as stat1, 29 as stat2, .53 as stat3 union all
select 'female' as gender, .88 as stat1, 13 as stat2, .51 as stat3 union all
select 'female' as gender, .86 as stat1, 38 as stat2, .49 as stat3 union all
select 'female' as gender, .77 as stat1, 35 as stat2, .40 as stat3 union all
select 'female' as gender, .74 as stat1, 15 as stat2, .58 as stat3 union all
select 'female' as gender, .95 as stat1, 17 as stat2, .59 as stat3
),
-- we create rank columns
stats_with_ranks as (
select
*
,row_number() over (partition by gender) as rank
,rank() over (partition by gender order by stat1 desc) as stat1rk
,rank() over (partition by gender order by stat2 desc) as stat2rk
,rank() over (partition by gender order by stat3 desc) as stat3rk
from stats
)
select * from stats_with_ranks where gender = 'female' order by rank asc
這不是我們問題的正確表格。 我們希望每個指標都在其自己的列中排序,以便其值對應於單一rank
列。 例如,每個stat1
、 stat2
、 stat3
的最大值將與rank = 1
在同一行。 是這樣的:
我們想出了以下方法:
select
a.gender
,a.rank
,b.stat1 as stat1
,c.stat2 as stat2
,d.stat3 as stat3
from stats_with_ranks as a
left join stats_with_ranks as b on a.gender = b.gender and a.rank = b.stat1rk
left join stats_with_ranks as c on a.gender = c.gender and a.rank = c.stat2rk
left join stats_with_ranks as d on a.gender = d.gender and a.rank = d.stat3rk
order by gender asc, rank asc
然而,這種方法不適用於我們更大的數據集,我們正在排名的表中有 200 多個統計數據。 我們很快就得到了Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.
當這些左連接有 200 多個時。 這就是為什么我們正在尋找一種在列內排序的解決方案。
下面的答案假設所有統計列都具有相同的數據類型並且具有與 statN 一樣的命名約定
execute immediate (select '''
select * from (
select *, row_number() over(partition by gender, col order by stat desc) rank
from stats
unpivot (stat for col in (''' || col_list || '''))
)
pivot (any_value(stat) for col in (''' || val_list || '''))
order by gender, rank
'''
from (
select
string_agg('"stat' || pos || '"') val_list,
string_agg('stat' || pos) col_list
from unnest(generate_array(1,3)) pos
)
)
如果應用於您問題中的示例數據(所有 float64 值)
只要不超過查詢資源,我們不介意列出所有 200 多個統計信息......
create temp function extract_keys(input string) returns array<string> language js as "return Object.keys(JSON.parse(input));";
create temp function extract_values(input string) returns array<string> language js as "return Object.values(JSON.parse(input));";
select * from (
select gender, col, cast(stat as float64) stat, row_number() over(partition by gender, col order by cast(stat as float64) desc) rank
from stats t,
unnest([struct(to_json_string((select as struct * except(gender) from unnest([t]))) as json)]),
unnest(extract_keys(json)) col with offset
join unnest(extract_values(json)) stat with offset
using(offset)
)
pivot (any_value(stat) for col in ('stat1','stat2','stat3'))
如果應用於您問題中的示例數據 - output 是
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.