简体   繁体   中英

Create a new column with an array of column names based on condition

I need to create a new column in a Bigquery table. This new column values must be an array of the colour column names based on a condition. My table "Users" looks like:

User Red Blue Green Yellow Other columns not related
First 1 1 0 1
Second 0 0 1 0
Third 1 1 0 0

and the expected table or view must be something like this:

User Red Blue Green Yellow Array_col
First 1 1 0 1 Red, Blue, Yellow
Second 0 0 1 0 Green
Third 1 1 0 0 Red, Blue

I am relatively new to SQL and Bigquery sintax, I've tried creating a view and also with

ALTER TABLE myproject.mydataset.users`

and I get the column names with this:

SELECT column_names
FROM `myproject.mydataset`.INFORMATION_SCHEMA.COLUMNS
WHERE (table_name = 'users')

But I can't find how to create the array list and make a column with them. I'm crashing against a wall.

Consider below approach

select *, 
  regexp_extract_all(translate(to_json_string(t), '{}"', ''), r'(\w+):1') as Array_col
from mytable t       

if applied to sample data in your question - output is

在此处输入图像描述

Maybe this will be ok:

with mytable as (
  select 'First' as user, 1 as Red, 1 as Blue, 0 as Green, 1 as Yellow union all
  select 'Second' as user, 0 as Red, 0 as Blue, 1 as Green, 0 as Yellow union all
  select 'Third' as user, 1 as Red, 1 as Blue, 0 as Green, 0 as Yellow
)
select *, split(trim(if(Red=1, 'Red', '') || if(Blue=1, ' Blue', '') || if(Green=1, ' Green', '') || if(Yellow=1, ' Yellow', ''), ' '), ' ')
from mytable

Now you can try super new syntax for BigQuery! Let's do this with UNPIVOT statement:

WITH mytable AS (
  SELECT 'first' AS user, 1 AS Red, 1 AS Blue, 0 AS Green, 1 AS Yellow UNION ALL
  SELECT 'second' AS user, 0 AS Red, 0 AS Blue, 1 AS Green, 0 AS Yellow UNION ALL
  SELECT 'third' AS user, 1 AS Red, 1 AS Blue, 0 AS Green, 0 AS Yellow
)
SELECT t.*, Array_col FROM (
    SELECT user, ARRAY_AGG(color_name) Array_col FROM mytable 
    UNPIVOT(check_color FOR color_name IN (Red,Blue,Green,Yellow))
    WHERE check_color = 1
    GROUP BY user    
)
JOIN mytable t USING (user) 

Yes, the column names are always the same...

In case if you want to consider only those 4 columns - check below

select t1.*, 
  array(select color from t2.colors where value = 1) Array_col
from mytable t1, 
unnest([struct([struct<value int64, color string>
  (Red, 'Red'), (Blue, 'Blue'), (Green, 'Green'), (Yellow, 'Yellow')
] as colors)]) t2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM