相當於谷歌bigquery中包含的字符串

Question

我有一個如下所示的表格

我想創建two new binary columns來指示受試者是否服用了steroids和aspirin 。 我希望在Postgresql and google bigquery中實現這一點

我嘗試了以下但它不起作用

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

我希望我的 output 如下所示

Answer 1

使用條件聚合。 這是一個適用於大多數（如果不是全部）RDBMS 的解決方案：

SELECT
    subject_id,
    MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
    MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

注意：目前尚不清楚您為什么使用LIKE ，因為您似乎有完全匹配； 我將LIKE條件變為等式。

Answer 2

您缺少group-by

select subject_id,
    sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
       then 1 else 0 end) as steroids,
    sum(case when lower(drug) in ('peptide','paracetamol') 
       then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

使用like關鍵字

select subject_id,
 sum(case when lower(drug) like '%cortisol%'
        or lower(drug) like '%cortisone%'
        or lower(drug) like '%dexamethasone%'   
    then 1 else 0 end) as steroids,
    sum(case when lower(drug) like '%peptide%'
        or lower(drug) like '%paracetamol%'
    then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

Answer 3

在 Postgres 中，我建議使用filter子句：

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

在 BigQuery 中，我會推薦countif() ：

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

您可以使用sum(case when. . . end)作為更通用的方法。 但是，每個數據庫都有一種更“本地”的方式來表達這種邏輯。 順便說一句， FILTER子句是標准 SQL，只是沒有被廣泛采用。

Answer 4

以下是 BigQuery 標准 SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id

如果適用於您的問題的樣本數據 - 結果是

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1

注意：而不是簡單的 LIKE 以冗長和冗余的文本結尾 - 我LIKE on steroids - 這是REGEXP_CONTAINS

Answer 5

另一個可能更直觀的解決方案是使用BigQuery Contains_Substr返回 boolean 結果。

相當於谷歌bigquery中包含的字符串

問題描述

5 個解決方案

解決方案1
1 2019-10-02 08:29:14

解決方案2
1 2019-10-02 08:31:14

解決方案3
1 已采納 2019-10-02 11:34:22

解決方案4
1 2019-10-02 13:04:59

解決方案5
1 2022-02-22 17:55:55

相當於谷歌bigquery中包含的字符串

問題描述

5 個解決方案

解決方案1 1 2019-10-02 08:29:14

解決方案2 1 2019-10-02 08:31:14

解決方案3 1 已采納 2019-10-02 11:34:22

解決方案4 1 2019-10-02 13:04:59

解決方案5 1 2022-02-22 17:55:55

解決方案1
1 2019-10-02 08:29:14

解決方案2
1 2019-10-02 08:31:14

解決方案3
1 已采納 2019-10-02 11:34:22

解決方案4
1 2019-10-02 13:04:59

解決方案5
1 2022-02-22 17:55:55