BigQuery SQL - 根据多列的最大值创建新列

Question

I have a table contains info about customers and their purchases amount of each type of food.我有一张表格，其中包含有关客户及其对每种食物的购买量的信息。 I want to create new columns that is the most freq type of food they have purchased.我想创建新的列，这些列是他们购买的最频繁的食物类型。 Is there an efficient way to do this?有没有一种有效的方法来做到这一点？

I tried using case when and do one-to-one comparison, but it got very tedious.我尝试使用 case when 并进行一对一比较，但它变得非常乏味。

Sample data:样本数据：

Cust_ID客户 ID	apple_type1苹果_type1	apple_type2苹果_type2	apple_type3苹果_type3	apple_type4苹果_type4	apple_type5苹果_type5	apple_type6苹果_type6
1 1	2 2	0 0	0 0	3 3	6 6	1 1
2 2	0 0	0 0	0 0	1 1	0 0	1 1
3 3	4 4	2 2	1 1	1 1	0 0	1 1
4 4	5 5	5 5	5 5	0 0	0 0	0 0
5 5	0 0	0 0	0 0	0 0	0 0	0 0

--WANT - 想

Cust_ID客户 ID	freq_apple_type_buy freq_apple_type_buy
1 1	type5类型5
2 2	type4 and type6类型 4 和类型 6
3 3	type1类型1
4 4	type1 and type2 and type3 type1 和 type2 和 type3
5 5	unknown未知

Answer 1

This uses UNPIVOT to turn your columns in to rows.这使用 UNPIVOT 将您的列转换为行。 Then uses RANK() to assign each row a rank, which means if multiple rows are matched in quantity, they share the same rank.然后使用 RANK() 为每一行分配一个等级，这意味着如果多行在数量上匹配，它们共享相同的等级。

It then selects only the products with rank=1 (possibly multiple rows, if multiple products are tied for first place)然后它只选择 rank=1 的产品（可能是多行，如果多个产品并列第一）

WITH
  normalised_and_ranked AS
(
  SELECT
    cust_id,
    product,
    qty,
    RANK() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_rank,
    ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_row
  FROM
     yourData
  UNPIVOT(
    qty FOR product IN (apple_type1, apple_type2, apple_type3, apple_type4, apple_type5, apple_type6)
  )
)
SELECT
  cust_id,
  CASE WHEN qty = 0 THEN NULL ELSE product END   AS product,
  CASE WHEN qty = 0 THEN NULL ELSE qty END   AS qty
FROM
  normalised_and_ranked
WHERE
  (product_rank = 1 AND qty > 0)
  OR
  (product_row = 1)

Edit: fudge added to ensure row of nulls returned if all qty are 0.编辑：添加软糖以确保在所有数量均为 0 时返回空行。

(Normally I'd just not return a row for such customers.) （通常我不会为这些客户返回一行。）

Answer 2

Consider below approach考虑以下方法

select Cust_ID, if(count(1) = any_value(all_count), 'unknown', string_agg(type, ' and ')) freq_apple_type_buy
from (
  select *, count(1) over(partition by Cust_ID) all_count
  from (
    select Cust_ID, replace(arr[offset(0)], 'apple_', '') type,cast(arr[offset(1)] as int64) value
    from data t,
    unnest(split(translate(to_json_string((select as struct * except(Cust_ID) from unnest([t]))), '{}"', ''))) kv,
    unnest([struct(split(kv, ':') as arr)])
  )
  where true qualify 1 = rank() over(partition by Cust_ID order by value desc)
)
group by Cust_ID

if applied to sample data in your question - output is如果应用于您问题中的样本数据 - 输出是

BigQuery SQL - 根据多列的最大值创建新列

问题描述

2 个解决方案

解决方案1
0 2021-10-29 22:45:48

解决方案2
0 2021-10-29 22:47:37

Cust_ID客户 ID	apple_type1苹果_type1	apple_type2苹果_type2	apple_type3苹果_type3	apple_type4苹果_type4	apple_type5苹果_type5	apple_type6苹果_type6
1 1	2 2	0 0	0 0	3 3	6 6	1 1
2 2	0 0	0 0	0 0	1 1	0 0	1 1
3 3	4 4	2 2	1 1	1 1	0 0	1 1
4 4	5 5	5 5	5 5	0 0	0 0	0 0
5 5	0 0	0 0	0 0	0 0	0 0	0 0

Cust_ID客户 ID	apple_type1苹果_type1	apple_type2苹果_type2	apple_type3苹果_type3	apple_type4苹果_type4	apple_type5苹果_type5	apple_type6苹果_type6
1 1	2 2	0 0	0 0	3 3	6 6	1 1
2 2	0 0	0 0	0 0	1 1	0 0	1 1
3 3	4 4	2 2	1 1	1 1	0 0	1 1
4 4	5 5	5 5	5 5	0 0	0 0	0 0
5 5	0 0	0 0	0 0	0 0	0 0	0 0

BigQuery SQL - 根据多列的最大值创建新列

问题描述

2 个解决方案

解决方案1 0 2021-10-29 22:45:48

解决方案2 0 2021-10-29 22:47:37

解决方案1
0 2021-10-29 22:45:48

解决方案2
0 2021-10-29 22:47:37

Cust_ID客户 ID	apple_type1苹果_type1	apple_type2苹果_type2	apple_type3苹果_type3	apple_type4苹果_type4	apple_type5苹果_type5	apple_type6苹果_type6
1 1	2 2	0 0	0 0	3 3	6 6	1 1
2 2	0 0	0 0	0 0	1 1	0 0	1 1
3 3	4 4	2 2	1 1	1 1	0 0	1 1
4 4	5 5	5 5	5 5	0 0	0 0	0 0
5 5	0 0	0 0	0 0	0 0	0 0	0 0