如何通過在 bigquery sql 中進行分組字符串比較來返回同一列中字符串值的差異？

Question

我有一個產品表，其中包含很多產品，例如：

產品	牌
高露潔微笑 250gr	高露潔
高露潔清新口氣 250gr	高露潔
高露潔薄荷 250gr	高露潔
relx pod pro 芒果 - 1pod	放松
relx pod pro 荔枝 - 1pod	放松
燒酒真露 chamisul 綠葡萄 360ml	真露
燒酒真露 chamisul 草莓 360ml	真露
燒酒真露 chamisul 蘋果葡萄 360ml	真露

進入

產品	牌	單詞
高露潔微笑 250gr	高露潔	微笑
高露潔清新口氣 250gr	高露潔	清新口氣
高露潔薄荷 250gr	高露潔	薄荷
relx pod pro 芒果 - 1pod	放松	芒果
relx pod pro 荔枝 - 1pod	放松	荔枝
燒酒真露 chamisul 綠葡萄 360ml	真露	綠葡萄
燒酒真露 chamisul 草莓 360ml	真露	草莓
燒酒真露 chamisul 蘋果 360ml	真露	蘋果

我想按品牌分組並獲得字符串的差異並將其作為新列返回。 我該如何進行轉型？ 並檢查 regexp_contains(str_1, str_2_split)=false 並返回值？

Answer 1

考慮以下幼稚的方法

將產品拆分為不同的單詞
識別在同一品牌的所有行中重復的單詞
加入原始表並刪除（替換為空字符串）所有此類單詞
剩下的 - 修剪它並 [可選地] 用一個空格替換多個空格的出現

因此，查詢如下所示

with common_words as (
  select brand, 
    r'' || array_to_string(array(
      select word
      from t.words word
      group by word
      having count(*) = cnt
    ), '|') words
  from (
    select brand, count(*) cnt, array_concat_agg(words) words
    from (
      select brand, array(
          select distinct word
          from unnest(split(product, ' ')) word
        ) words
      from your_table
    )
    group by brand
  ) t
)
select product, brand, 
  regexp_replace(trim(regexp_replace(product, words, '')), r'\s+', ' ') as diff
from your_table
join common_words
using (brand)

如果應用於您問題中的示例數據 - output 是

如何通過在 bigquery sql 中進行分組字符串比較來返回同一列中字符串值的差異？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-08-24 21:11:31

如何通過在 bigquery sql 中進行分組字符串比較來返回同一列中字符串值的差異？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-08-24 21:11:31

解決方案1
1 已采納 2022-08-24 21:11:31