简体   繁体   English

如何使用 BigQuery 按匹配值过滤行?

[英]How filter rows by matched values using BigQuery?

I have a table in BigQuery我在 BigQuery 中有一张表

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',
      UNION ALL SELECT 1,2, 'z3Awwer', 
      UNION ALL SELECT 1,2, 'gA1sd03',
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',
      UNION ALL SELECT 1,4, 'sAs10sdf4',
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',
      UNION ALL SELECT 2,3, 'A1sd04',
      UNION ALL SELECT 2,4, '6_0Awe105'

I want to filter it by temp_id if all names of one temp_id included in some another temp_id in partition by big_id window.如果一个temp_id所有names temp_id包含在big_id窗口分区中的另一个temp_id中,我想通过temp_id进行过滤。 For example I do not need to select all rows where temp_id = 2 because all names of temp_id = 2 included in temp_id = 1. As well as need to keep all rows of temp_id = 1 because this names range covers names range of temp_id = 2比如我并不需要选择所有temp_id = 2,因为所有namestemp_id列入= 2 temp_id = 1,除了需要保留的所有行temp_id = 1,因为这names范围覆盖names范围temp_id = 2

So expected output:所以预期的输出:

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',     
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',     
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',      
      UNION ALL SELECT 2,4, '6_0Awe105'

How can I make it using BigQuery?我怎样才能使用 BigQuery 做到这一点?

Below is for BigQuery Standard SQL下面是 BigQuery 标准 SQL

#standardsql
with temp as (
  select big_id, temp_id, array_agg(names) names
  from `project.dataset.table`
  group by big_id, temp_id
)
select big_id, temp_id, names 
from (
  select big_id, temp_id, any_value(names) names 
  from (
    select t1.*,
      ( select count(1)
        from t1.names name
        join t2.names name
        using(name)
        where t1.temp_id != t2.temp_id
      ) = array_length(t1.names) as flag
    from temp t1 
    join temp t2
    using (big_id)
  )
  group by big_id, temp_id
  having countif(flag) = 0
), unnest(names) names    

If to apply above to sample data from your question - the output is如果将上述应用于您问题中的样本数据 - 输出为

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM