简体   繁体   English

相当于谷歌bigquery中包含的字符串

[英]Equivalent of string contains in google bigquery

I have a table like as shown below我有一个如下所示的表格

在此处输入图像描述

I would like to create two new binary columns indicating whether the subject had steroids and aspirin .我想创建two new binary columns来指示受试者是否服用了steroidsaspirin I am looking to implement this in Postgresql and google bigquery我希望在Postgresql and google bigquery中实现这一点

I tried the below but it doesn't work我尝试了以下但它不起作用

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

I expect my output to be like as shown below我希望我的 output 如下所示

在此处输入图像描述

Use conditional aggregation.使用条件聚合。 This is a solution that works across most (if not all) RDBMS:这是一个适用于大多数(如果不是全部)RDBMS 的解决方案:

SELECT
    subject_id,
    MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
    MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

NB: it is unclear why you are using LIKE , since it seems like you are having exact matches;注意:目前尚不清楚您为什么使用LIKE ,因为您似乎有完全匹配; I turned the LIKE condition to equalities.我将LIKE条件变为等式。

you have missing group-by您缺少group-by

select subject_id,
    sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
       then 1 else 0 end) as steroids,
    sum(case when lower(drug) in ('peptide','paracetamol') 
       then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

using like keyword使用like关键字

select subject_id,
 sum(case when lower(drug) like '%cortisol%'
        or lower(drug) like '%cortisone%'
        or lower(drug) like '%dexamethasone%'   
    then 1 else 0 end) as steroids,
    sum(case when lower(drug) like '%peptide%'
        or lower(drug) like '%paracetamol%'
    then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

In Postgres, I would recommend using the filter clause:在 Postgres 中,我建议使用filter子句:

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

In BigQuery, I would recommend countif() :在 BigQuery 中,我会推荐countif()

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

You can use sum(case when. . . end) as a more general approach.您可以使用sum(case when. . . end)作为更通用的方法。 However, each database has a more "local" way of expressing this logic.但是,每个数据库都有一种更“本地”的方式来表达这种逻辑。 By the way, the FILTER clause is standard SQL, just not widely adopted.顺便说一句, FILTER子句标准 SQL,只是没有被广泛采用。

Below is for BigQuery Standard SQL以下是 BigQuery 标准 SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id   

if to apply to sample data from your question - result is如果适用于您的问题的样本数据 - 结果是

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1     

Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids - which is REGEXP_CONTAINS注意:而不是简单的 LIKE 以冗长和冗余的文本结尾 - 我LIKE on steroids - 这是REGEXP_CONTAINS

Another potentially more intutive solution would be to use the BigQuery Contains_Substr to return boolean results.另一个可能更直观的解决方案是使用BigQuery Contains_Substr返回 boolean 结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM