[英]BigQuery check if string values contain elements of an array or subquery
I'm struggling a bit to reason how to best go about this.我正在努力思考如何最好地解决这个问题。 I have a table that has 1 column (b) of values I like to check against.
我有一个表,其中包含我想要检查的 1 列 (b) 值。 The main table I am querying has a column (a) with string values, for which I want to check if they contain elements from the table with column b
我正在查询的主表有一个包含字符串值的列 (a),我想检查它们是否包含列 b 的表中的元素
Right now I'm only checking for literal matches;现在我只检查文字匹配;
SELECT CASE WHEN lower(table1.a) IN (SELECT lower(table2.b) FROM table2) THEN 1 ELSE 0 END
FROM table1
However the values in column b can contain more words other than an exact match.但是,b 列中的值可以包含除完全匹配之外的更多单词。 Now I thought I could use SPLIT to check each word in the string, but not sure if that's the best/only way to go.
现在我想我可以使用 SPLIT 检查字符串中的每个单词,但不确定这是否是 go 的最佳/唯一方法。 Especially since it varies per row how many words the string even has.
特别是因为它每行都有多少个字符串甚至有多少单词。
Something like;就像是;
DECLARE match_against as ARRAY;
SET match_against = SELECT lower(table2.b) FROM table2;
SELECT CASE WHEN lower(split(table1.a, ' ')[SAFE_ORDINAL(0)] IN match_against
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(1)] IN match_against
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(2)] IN match_against
THEN 1 ELSE 0 END
FROM table1;
Especially if column a in table2 is containing strings with multiple words this becomes unworkable.特别是如果 table2 中的列 a 包含具有多个单词的字符串,这将变得不可行。 (check for any match between a word from the string in column a and any word in the strings from column b)
(检查 a 列中字符串中的单词与 b 列中字符串中的任何单词之间的任何匹配)
any suggestions on how to achieve this?关于如何实现这一目标的任何建议?
Some example data: table1 column a values: 'abc', 'aBc dEf', 'abc def ghij' table2 column b values: 'def'一些示例数据:table1 列 a 值:'abc'、'aBc dEf'、'abc def ghij' table2 列 b 值:'def'
Expected result is 0,1,1预期结果是 0,1,1
A left join is a good way to comnbine two tables.左连接是组合两个表的好方法。 The
CONTAINS_SUBSTR
does not accept a column as seach parameter, therefore, we use a UDF: CONTAINS_SUBSTR
不接受列作为 seach 参数,因此,我们使用 UDF:
create temp function contains_str(a string,b string ) returns bool language js as
"""
return a.includes(b);
""";
with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"noabc,"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
contains_str(a,b)
from tbl_a
left join tbl_b
on contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," "))
We add spaces around each search.我们在每个搜索周围添加空格。 Thus
noabc
is not matched by abc
and cats
not by cat
.因此
noabc
与abc
不匹配,而cat
与cats
不匹配。
Each entry of table A is repeated for each match to table B.每次匹配到表 B 时,都会重复表 A 的每个条目。
Another way is following.另一种方法是跟随。 It only give the entries of table A which fullfill a match in table B, but it cannnot return additional columns from table B. The
soundex
helps to find matches.它只给出表 A 的条目,这些条目在表 B 中填充了一个匹配项,但它不能从表 B 返回其他列
soundex
有助于找到匹配项。 Please read documentation .请阅读文档。
create temp function contains_str(a string,b string ) returns bool language js as
"""
return a.includes(b);
""";
with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
from tbl_a
where exists (select if(contains_str(concat(" ",soundex(a)," "),concat(" ",soundex(b)," ")),1,null) from tbl_b )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.