[英]BigQuery SQL - Create New Column Based on the Max Value from Multiple Columns
I have a table contains info about customers and their purchases amount of each type of food.我有一张表格,其中包含有关客户及其对每种食物的购买量的信息。 I want to create new columns that is the most freq type of food they have purchased.
我想创建新的列,这些列是他们购买的最频繁的食物类型。 Is there an efficient way to do this?
有没有一种有效的方法来做到这一点?
I tried using case when and do one-to-one comparison, but it got very tedious.我尝试使用 case when 并进行一对一比较,但它变得非常乏味。
Sample data:样本数据:
Cust_ID![]() |
apple_type1![]() |
apple_type2![]() |
apple_type3![]() |
apple_type4![]() |
apple_type5![]() |
apple_type6![]() |
---|---|---|---|---|---|---|
1 ![]() |
2 ![]() |
0 ![]() |
0 ![]() |
3 ![]() |
6 ![]() |
1 ![]() |
2 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
3 ![]() |
4 ![]() |
2 ![]() |
1 ![]() |
1 ![]() |
0 ![]() |
1 ![]() |
4 ![]() |
5 ![]() |
5 ![]() |
5 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
5 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
--WANT - 想
Cust_ID![]() |
freq_apple_type_buy ![]() |
---|---|
1 ![]() |
type5![]() |
2 ![]() |
type4 and type6![]() |
3 ![]() |
type1![]() |
4 ![]() |
type1 and type2 and type3 ![]() |
5 ![]() |
unknown![]() |
This uses UNPIVOT to turn your columns in to rows.这使用 UNPIVOT 将您的列转换为行。 Then uses RANK() to assign each row a rank, which means if multiple rows are matched in quantity, they share the same rank.
然后使用 RANK() 为每一行分配一个等级,这意味着如果多行在数量上匹配,它们共享相同的等级。
It then selects only the products with rank=1 (possibly multiple rows, if multiple products are tied for first place)然后它只选择 rank=1 的产品(可能是多行,如果多个产品并列第一)
WITH
normalised_and_ranked AS
(
SELECT
cust_id,
product,
qty,
RANK() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_rank,
ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY qty DESC) AS product_row
FROM
yourData
UNPIVOT(
qty FOR product IN (apple_type1, apple_type2, apple_type3, apple_type4, apple_type5, apple_type6)
)
)
SELECT
cust_id,
CASE WHEN qty = 0 THEN NULL ELSE product END AS product,
CASE WHEN qty = 0 THEN NULL ELSE qty END AS qty
FROM
normalised_and_ranked
WHERE
(product_rank = 1 AND qty > 0)
OR
(product_row = 1)
Edit: fudge added to ensure row of nulls returned if all qty are 0.编辑:添加软糖以确保在所有数量均为 0 时返回空行。
(Normally I'd just not return a row for such customers.) (通常我不会为这些客户返回一行。)
Consider below approach考虑以下方法
select Cust_ID, if(count(1) = any_value(all_count), 'unknown', string_agg(type, ' and ')) freq_apple_type_buy
from (
select *, count(1) over(partition by Cust_ID) all_count
from (
select Cust_ID, replace(arr[offset(0)], 'apple_', '') type,cast(arr[offset(1)] as int64) value
from data t,
unnest(split(translate(to_json_string((select as struct * except(Cust_ID) from unnest([t]))), '{}"', ''))) kv,
unnest([struct(split(kv, ':') as arr)])
)
where true qualify 1 = rank() over(partition by Cust_ID order by value desc)
)
group by Cust_ID
if applied to sample data in your question - output is如果应用于您问题中的样本数据 - 输出是
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.