[英]Equivalent of string contains in google bigquery
我有一個如下所示的表格
我想創建two new binary columns
來指示受試者是否服用了steroids
和aspirin
。 我希望在Postgresql and google bigquery
中實現這一點
我嘗試了以下但它不起作用
select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%')
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%')
then 1 else 0 end as aspirin,
from db.Team01.Table_1
SELECT
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
我希望我的 output 如下所示
使用條件聚合。 這是一個適用於大多數(如果不是全部)RDBMS 的解決方案:
SELECT
subject_id,
MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id
注意:目前尚不清楚您為什么使用LIKE
,因為您似乎有完全匹配; 我將LIKE
條件變為等式。
您缺少group-by
select subject_id,
sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
then 1 else 0 end) as steroids,
sum(case when lower(drug) in ('peptide','paracetamol')
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
使用like
關鍵字
select subject_id,
sum(case when lower(drug) like '%cortisol%'
or lower(drug) like '%cortisone%'
or lower(drug) like '%dexamethasone%'
then 1 else 0 end) as steroids,
sum(case when lower(drug) like '%peptide%'
or lower(drug) like '%paracetamol%'
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
在 Postgres 中,我建議使用filter
子句:
select subject_id,
count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
在 BigQuery 中,我會推薦countif()
:
select subject_id,
countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
您可以使用sum(case when. . . end)
作為更通用的方法。 但是,每個數據庫都有一種更“本地”的方式來表達這種邏輯。 順便說一句, FILTER
子句是標准 SQL,只是沒有被廣泛采用。
以下是 BigQuery 標准 SQL
#standardSQL
SELECT
subject_id,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id
如果適用於您的問題的樣本數據 - 結果是
Row subject_id steroids aspirin
1 1 3 1
2 2 1 1
注意:而不是簡單的 LIKE 以冗長和冗余的文本結尾 - 我LIKE on steroids
- 這是REGEXP_CONTAINS
另一個可能更直觀的解決方案是使用BigQuery Contains_Substr返回 boolean 結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.