简体   繁体   English

BigQuery SQL - 根据多列的最大值创建新列

[英]BigQuery SQL - Create New Column Based on the Max Value from Multiple Columns

I have a table contains info about customers and their purchases amount of each type of food.我有一张表格,其中包含有关客户及其对每种食物的购买量的信息。 I want to create new columns that is the most freq type of food they have purchased.我想创建新的列,这些列是他们购买的最频繁的食物类型。 Is there an efficient way to do this?有没有一种有效的方法来做到这一点?

I tried using case when and do one-to-one comparison, but it got very tedious.我尝试使用 case when 并进行一对一比较,但它变得非常乏味。

Sample data:样本数据:

Cust_ID客户 ID apple_type1苹果_type1 apple_type2苹果_type2 apple_type3苹果_type3 apple_type4苹果_type4 apple_type5苹果_type5 apple_type6苹果_type6
1 1 2 2 0 0 0 0 3 3 6 6 1 1
2 2 0 0 0 0 0 0 1 1 0 0 1 1
3 3 4 4 2 2 1 1 1 1 0 0 1 1
4 4 5 5 5 5 5 5 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0 0 0 0 0

--WANT - 想

Cust_ID客户 ID freq_apple_type_buy freq_apple_type_buy
1 1 type5类型5
2 2 type4 and type6类型 4 和类型 6
3 3 type1类型1
4 4 type1 and type2 and type3 type1 和 type2 和 type3
5 5 unknown未知

This uses UNPIVOT to turn your columns in to rows.这使用 UNPIVOT 将您的列转换为行。 Then uses RANK() to assign each row a rank, which means if multiple rows are matched in quantity, they share the same rank.然后使用 RANK() 为每一行分配一个等级,这意味着如果多行在数量上匹配,它们共享相同的等级。

It then selects only the products with rank=1 (possibly multiple rows, if multiple products are tied for first place)然后它只选择 rank=1 的产品(可能是多行,如果多个产品并列第一)

WITH
  normalised_and_ranked AS
(
  SELECT
    cust_id,
    product,
    qty,
    RANK() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_rank,
    ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_row
  FROM
     yourData
  UNPIVOT(
    qty FOR product IN (apple_type1, apple_type2, apple_type3, apple_type4, apple_type5, apple_type6)
  )
)
SELECT
  cust_id,
  CASE WHEN qty = 0 THEN NULL ELSE product END   AS product,
  CASE WHEN qty = 0 THEN NULL ELSE qty END   AS qty
FROM
  normalised_and_ranked
WHERE
  (product_rank = 1 AND qty > 0)
  OR
  (product_row = 1)

Edit: fudge added to ensure row of nulls returned if all qty are 0.编辑:添加软糖以确保在所有数量均为 0 时返回空行。

(Normally I'd just not return a row for such customers.) (通常我不会为这些客户返回一行。)

Consider below approach考虑以下方法

select Cust_ID, if(count(1) = any_value(all_count), 'unknown', string_agg(type, ' and ')) freq_apple_type_buy
from (
  select *, count(1) over(partition by Cust_ID) all_count
  from (
    select Cust_ID, replace(arr[offset(0)], 'apple_', '') type,cast(arr[offset(1)] as int64) value
    from data t,
    unnest(split(translate(to_json_string((select as struct * except(Cust_ID) from unnest([t]))), '{}"', ''))) kv,
    unnest([struct(split(kv, ':') as arr)])
  )
  where true qualify 1 = rank() over(partition by Cust_ID order by value desc)
)
group by Cust_ID    

if applied to sample data in your question - output is如果应用于您问题中的样本数据 - 输出是

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM