相当于谷歌bigquery中包含的字符串

Question

I have a table like as shown below我有一个如下所示的表格

I would like to create two new binary columns indicating whether the subject had steroids and aspirin .我想创建two new binary columns来指示受试者是否服用了steroids和aspirin 。 I am looking to implement this in Postgresql and google bigquery我希望在Postgresql and google bigquery中实现这一点

I tried the below but it doesn't work我尝试了以下但它不起作用

select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%') 
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%') 
then 1 else 0 end as aspirin,
from db.Team01.Table_1


SELECT 
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')

I expect my output to be like as shown below我希望我的 output 如下所示

Answer 1

Use conditional aggregation.使用条件聚合。 This is a solution that works across most (if not all) RDBMS:这是一个适用于大多数（如果不是全部）RDBMS 的解决方案：

SELECT
    subject_id,
    MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
    MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id

NB: it is unclear why you are using LIKE , since it seems like you are having exact matches;注意：目前尚不清楚您为什么使用LIKE ，因为您似乎有完全匹配； I turned the LIKE condition to equalities.我将LIKE条件变为等式。

Answer 2

you have missing group-by您缺少group-by

select subject_id,
    sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
       then 1 else 0 end) as steroids,
    sum(case when lower(drug) in ('peptide','paracetamol') 
       then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

using like keyword使用like关键字

select subject_id,
 sum(case when lower(drug) like '%cortisol%'
        or lower(drug) like '%cortisone%'
        or lower(drug) like '%dexamethasone%'   
    then 1 else 0 end) as steroids,
    sum(case when lower(drug) like '%peptide%'
        or lower(drug) like '%paracetamol%'
    then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id

Answer 3

In Postgres, I would recommend using the filter clause:在 Postgres 中，我建议使用filter子句：

select subject_id,
       count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
       count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

In BigQuery, I would recommend countif() :在 BigQuery 中，我会推荐countif() ：

select subject_id,
       countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
       countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;

You can use sum(case when. . . end) as a more general approach.您可以使用sum(case when. . . end)作为更通用的方法。 However, each database has a more "local" way of expressing this logic.但是，每个数据库都有一种更“本地”的方式来表达这种逻辑。 By the way, the FILTER clause is standard SQL, just not widely adopted.顺便说一句， FILTER子句是标准 SQL，只是没有被广泛采用。

Answer 4

Below is for BigQuery Standard SQL以下是 BigQuery 标准 SQL

#standardSQL
SELECT 
  subject_id,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
  SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id

if to apply to sample data from your question - result is如果适用于您的问题的样本数据 - 结果是

Row subject_id  steroids    aspirin  
1   1           3           1    
2   2           1           1

Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids - which is REGEXP_CONTAINS注意：而不是简单的 LIKE 以冗长和冗余的文本结尾 - 我LIKE on steroids - 这是REGEXP_CONTAINS

Answer 5

Another potentially more intutive solution would be to use the BigQuery Contains_Substr to return boolean results.另一个可能更直观的解决方案是使用BigQuery Contains_Substr返回 boolean 结果。

相当于谷歌bigquery中包含的字符串

问题描述

5 个解决方案

解决方案1
1 2019-10-02 08:29:14

解决方案2
1 2019-10-02 08:31:14

解决方案3
1 已采纳 2019-10-02 11:34:22

解决方案4
1 2019-10-02 13:04:59

解决方案5
1 2022-02-22 17:55:55

相当于谷歌bigquery中包含的字符串

问题描述

5 个解决方案

解决方案1 1 2019-10-02 08:29:14

解决方案2 1 2019-10-02 08:31:14

解决方案3 1 已采纳 2019-10-02 11:34:22

解决方案4 1 2019-10-02 13:04:59

解决方案5 1 2022-02-22 17:55:55

解决方案1
1 2019-10-02 08:29:14

解决方案2
1 2019-10-02 08:31:14

解决方案3
1 已采纳 2019-10-02 11:34:22

解决方案4
1 2019-10-02 13:04:59

解决方案5
1 2022-02-22 17:55:55