簡體   English   中英

相當於谷歌bigquery中包含的字符串

[英]Equivalent of string contains in google bigquery

我有一個如下所示的表格

在此處輸入圖像描述

我想創建two new binary columns來指示受試者是否服用了steroidsaspirin 我希望在Postgresql and google bigquery中實現這一點

我嘗試了以下但它不起作用

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

我希望我的 output 如下所示

在此處輸入圖像描述

使用條件聚合。 這是一個適用於大多數(如果不是全部)RDBMS 的解決方案:

SELECT
    subject_id,
    MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
    MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

注意:目前尚不清楚您為什么使用LIKE ,因為您似乎有完全匹配; 我將LIKE條件變為等式。

您缺少group-by

select subject_id,
    sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
       then 1 else 0 end) as steroids,
    sum(case when lower(drug) in ('peptide','paracetamol') 
       then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

使用like關鍵字

select subject_id,
 sum(case when lower(drug) like '%cortisol%'
        or lower(drug) like '%cortisone%'
        or lower(drug) like '%dexamethasone%'   
    then 1 else 0 end) as steroids,
    sum(case when lower(drug) like '%peptide%'
        or lower(drug) like '%paracetamol%'
    then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

在 Postgres 中,我建議使用filter子句:

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

在 BigQuery 中,我會推薦countif()

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

您可以使用sum(case when. . . end)作為更通用的方法。 但是,每個數據庫都有一種更“本地”的方式來表達這種邏輯。 順便說一句, FILTER子句標准 SQL,只是沒有被廣泛采用。

以下是 BigQuery 標准 SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id   

如果適用於您的問題的樣本數據 - 結果是

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1     

注意:而不是簡單的 LIKE 以冗長和冗余的文本結尾 - 我LIKE on steroids - 這是REGEXP_CONTAINS

另一個可能更直觀的解決方案是使用BigQuery Contains_Substr返回 boolean 結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM