如何通过在 bigquery sql 中进行分组字符串比较来返回同一列中字符串值的差异？

Question

我有一个产品表，其中包含很多产品，例如：

产品	牌
高露洁微笑 250gr	高露洁
高露洁清新口气 250gr	高露洁
高露洁薄荷 250gr	高露洁
relx pod pro 芒果 - 1pod	放松
relx pod pro 荔枝 - 1pod	放松
烧酒真露 chamisul 绿葡萄 360ml	真露
烧酒真露 chamisul 草莓 360ml	真露
烧酒真露 chamisul 苹果葡萄 360ml	真露

进入

产品	牌	单词
高露洁微笑 250gr	高露洁	微笑
高露洁清新口气 250gr	高露洁	清新口气
高露洁薄荷 250gr	高露洁	薄荷
relx pod pro 芒果 - 1pod	放松	芒果
relx pod pro 荔枝 - 1pod	放松	荔枝
烧酒真露 chamisul 绿葡萄 360ml	真露	绿葡萄
烧酒真露 chamisul 草莓 360ml	真露	草莓
烧酒真露 chamisul 苹果 360ml	真露	苹果

我想按品牌分组并获得字符串的差异并将其作为新列返回。 我该如何进行转型？ 并检查 regexp_contains(str_1, str_2_split)=false 并返回值？

Answer 1

考虑以下幼稚的方法

将产品拆分为不同的单词
识别在同一品牌的所有行中重复的单词
加入原始表并删除（替换为空字符串）所有此类单词
剩下的 - 修剪它并 [可选地] 用一个空格替换多个空格的出现

因此，查询如下所示

with common_words as (
  select brand, 
    r'' || array_to_string(array(
      select word
      from t.words word
      group by word
      having count(*) = cnt
    ), '|') words
  from (
    select brand, count(*) cnt, array_concat_agg(words) words
    from (
      select brand, array(
          select distinct word
          from unnest(split(product, ' ')) word
        ) words
      from your_table
    )
    group by brand
  ) t
)
select product, brand, 
  regexp_replace(trim(regexp_replace(product, words, '')), r'\s+', ' ') as diff
from your_table
join common_words
using (brand)

如果应用于您问题中的示例数据 - output 是

如何通过在 bigquery sql 中进行分组字符串比较来返回同一列中字符串值的差异？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-24 21:11:31

如何通过在 bigquery sql 中进行分组字符串比较来返回同一列中字符串值的差异？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-24 21:11:31

解决方案1
1 已采纳 2022-08-24 21:11:31