[英]Equivalent of string contains in google bigquery
I have a table like as shown below我有一个如下所示的表格
I would like to create two new binary columns
indicating whether the subject had steroids
and aspirin
.我想创建two new binary columns
来指示受试者是否服用了steroids
和aspirin
。 I am looking to implement this in Postgresql and google bigquery
我希望在Postgresql and google bigquery
中实现这一点
I tried the below but it doesn't work我尝试了以下但它不起作用
select subject_id
case when lower(drug) like ('%cortisol%','%cortisone%','%dexamethasone%')
then 1 else 0 end as steroids,
case when lower(drug) like ('%peptide%','%paracetamol%')
then 1 else 0 end as aspirin,
from db.Team01.Table_1
SELECT
db.Team01.Table_1.drug
FROM `table_1`,
UNNEST(table_1.drug) drug
WHERE REGEXP_CONTAINS( db.Team01.Table_1.drug,r'%cortisol%','%cortisone%','%dexamethasone%')
I expect my output to be like as shown below我希望我的 output 如下所示
Use conditional aggregation.使用条件聚合。 This is a solution that works across most (if not all) RDBMS:这是一个适用于大多数(如果不是全部)RDBMS 的解决方案:
SELECT
subject_id,
MAX(CASE WHEN drug IN ('cortisol', 'cortisone', 'dexamethasone') THEN 1 END) steroids,
MAX(CASE WHEN drug IN ('peptide', 'paracetamol') THEN 1 END) aspirin
FROM db.Team01.Table_1.drug
GROUP BY subject_id
NB: it is unclear why you are using LIKE
, since it seems like you are having exact matches;注意:目前尚不清楚您为什么使用LIKE
,因为您似乎有完全匹配; I turned the LIKE
condition to equalities.我将LIKE
条件变为等式。
you have missing group-by
您缺少group-by
select subject_id,
sum(case when lower(drug) in ('cortisol','cortisone','dexamethasone')
then 1 else 0 end) as steroids,
sum(case when lower(drug) in ('peptide','paracetamol')
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
using like
keyword使用like
关键字
select subject_id,
sum(case when lower(drug) like '%cortisol%'
or lower(drug) like '%cortisone%'
or lower(drug) like '%dexamethasone%'
then 1 else 0 end) as steroids,
sum(case when lower(drug) like '%peptide%'
or lower(drug) like '%paracetamol%'
then 1 else 0 end) as aspirin
from db.Team01.Table_1
group by subject_id
In Postgres, I would recommend using the filter
clause:在 Postgres 中,我建议使用filter
子句:
select subject_id,
count(*) filter (where lower(drug) ~ 'cortisol|cortisone|dexamethasone') as steroids,
count(*) filter (where lower(drug) ~ 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
In BigQuery, I would recommend countif()
:在 BigQuery 中,我会推荐countif()
:
select subject_id,
countif(regexp_contains(drug, 'cortisol|cortisone|dexamethasone') as steroids,
countif(drug ~ ' 'peptide|paracetamol') as aspirin,
from db.Team01.Table_1
group by subject_id;
You can use sum(case when. . . end)
as a more general approach.您可以使用sum(case when. . . end)
作为更通用的方法。 However, each database has a more "local" way of expressing this logic.但是,每个数据库都有一种更“本地”的方式来表达这种逻辑。 By the way, the FILTER
clause is standard SQL, just not widely adopted.顺便说一句, FILTER
子句是标准 SQL,只是没有被广泛采用。
Below is for BigQuery Standard SQL以下是 BigQuery 标准 SQL
#standardSQL
SELECT
subject_id,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'cortisol|cortisone|dexamethasone') THEN 1 ELSE 0 END) AS steroids,
SUM(CASE WHEN REGEXP_CONTAINS(LOWER(drug), r'peptide|paracetamol') THEN 1 ELSE 0 END) AS aspirin
FROM `db.Team01.Table_1`
GROUP BY subject_id
if to apply to sample data from your question - result is如果适用于您的问题的样本数据 - 结果是
Row subject_id steroids aspirin
1 1 3 1
2 2 1 1
Note: instead of simple LIKE ending with lengthy and redundant text - I am using LIKE on steroids
- which is REGEXP_CONTAINS注意:而不是简单的 LIKE 以冗长和冗余的文本结尾 - 我LIKE on steroids
- 这是REGEXP_CONTAINS
Another potentially more intutive solution would be to use the BigQuery Contains_Substr to return boolean results.另一个可能更直观的解决方案是使用BigQuery Contains_Substr返回 boolean 结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.