简体   繁体   English

BigQuery 根据某个字符串值在列中出现的次数将所有指标分成 n 个相等的部分

[英]BigQuery to split the all the metrics into n equal parts depending on the number of times a certain string value appears in a column

I am not sure how to explain the problem.我不知道如何解释这个问题。 I will share the sample i/p and o/p below.我将在下面分享示例 i/p 和 o/p。 Note that it's not fixed how many times "job#" appears in a single row.请注意,“job#”在一行中出现的次数并不固定。

Input输入

在此处输入图像描述

Output Output

在此处输入图像描述

Try using the regexp_extract_all function like the following:尝试使用regexp_extract_all function,如下所示:

with sample_data as (
  SELECT 'camp1' as camp, '01/08/2022' as date, 'job#12' as job, 23 as a1, 34 as a2, 21 as a3 UNION ALL
  SELECT 'camp2', '01/08/2022', 'job#14 & job#15', 20, 30, 30 UNION ALL
  SELECT 'camp3', '01/08/2022', 'job#11 job#13 job#20', 21, 30, 21 union all
  select 'camp4', '01/08/2022', 'job#21 & job#22 & job#23 & job#24', 40, 12, 8
)

SELECT camp,
  date,
  job_ex,
  a1,
  a2,
  a3,
  a1/ count(job_ex) OVER (PARTITION BY camp) a1_split,
  a2/ count(job_ex) OVER (PARTITION BY camp) a2_split,
  a3/ count(job_ex) OVER (PARTITION BY camp) a3_split,
FROM sample_data,
  UNNEST(regexp_extract_all(job, r'job\#\d+')) as job_ex

It produces the following results它产生以下结果在此处输入图像描述

Consider below approach考虑以下方法

select camp, date, job, 
  a1/jobs_count as a1, 
  a2/jobs_count as a2, 
  a3/jobs_count as a3
from your_table, 
unnest([struct(regexp_extract_all(job, r'job#\d+') as jobs_arr)]), 
unnest([array_length(jobs_arr)]) jobs_count,
unnest(jobs_arr) job

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM