如何使用 BigQuery 按匹配值过滤行？

Question

I have a table in BigQuery我在 BigQuery 中有一张表

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',
      UNION ALL SELECT 1,2, 'z3Awwer', 
      UNION ALL SELECT 1,2, 'gA1sd03',
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',
      UNION ALL SELECT 1,4, 'sAs10sdf4',
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',
      UNION ALL SELECT 2,3, 'A1sd04',
      UNION ALL SELECT 2,4, '6_0Awe105'

I want to filter it by temp_id if all names of one temp_id included in some another temp_id in partition by big_id window.如果一个temp_id所有names temp_id包含在big_id窗口分区中的另一个temp_id中，我想通过temp_id进行过滤。 For example I do not need to select all rows where temp_id = 2 because all names of temp_id = 2 included in temp_id = 1. As well as need to keep all rows of temp_id = 1 because this names range covers names range of temp_id = 2比如我并不需要选择所有行temp_id = 2，因为所有names的temp_id列入= 2 temp_id = 1，除了需要保留的所有行temp_id = 1，因为这names范围覆盖names范围temp_id = 2

So expected output:所以预期的输出：

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',     
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',     
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',      
      UNION ALL SELECT 2,4, '6_0Awe105'

How can I make it using BigQuery?我怎样才能使用 BigQuery 做到这一点？

Answer 1

Below is for BigQuery Standard SQL下面是 BigQuery 标准 SQL

#standardsql
with temp as (
  select big_id, temp_id, array_agg(names) names
  from `project.dataset.table`
  group by big_id, temp_id
)
select big_id, temp_id, names 
from (
  select big_id, temp_id, any_value(names) names 
  from (
    select t1.*,
      ( select count(1)
        from t1.names name
        join t2.names name
        using(name)
        where t1.temp_id != t2.temp_id
      ) = array_length(t1.names) as flag
    from temp t1 
    join temp t2
    using (big_id)
  )
  group by big_id, temp_id
  having countif(flag) = 0
), unnest(names) names

If to apply above to sample data from your question - the output is如果将上述应用于您问题中的样本数据 - 输出为

如何使用 BigQuery 按匹配值过滤行？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-21 21:39:55

如何使用 BigQuery 按匹配值过滤行？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-21 21:39:55

解决方案1
2 已采纳 2020-10-21 21:39:55