简体   繁体   English

BigQuery 检查字符串值是否包含数组或子查询的元素

[英]BigQuery check if string values contain elements of an array or subquery

I'm struggling a bit to reason how to best go about this.我正在努力思考如何最好地解决这个问题。 I have a table that has 1 column (b) of values I like to check against.我有一个表,其中包含我想要检查的 1 列 (b) 值。 The main table I am querying has a column (a) with string values, for which I want to check if they contain elements from the table with column b我正在查询的主表有一个包含字符串值的列 (a),我想检查它们是否包含列 b 的表中的元素

Right now I'm only checking for literal matches;现在我只检查文字匹配;

SELECT CASE WHEN lower(table1.a) IN (SELECT lower(table2.b) FROM table2) THEN 1 ELSE 0 END
FROM table1

However the values in column b can contain more words other than an exact match.但是,b 列中的值可以包含除完全匹配之外的更多单词。 Now I thought I could use SPLIT to check each word in the string, but not sure if that's the best/only way to go.现在我想我可以使用 SPLIT 检查字符串中的每个单词,但不确定这是否是 go 的最佳/唯一方法。 Especially since it varies per row how many words the string even has.特别是因为它每行都有多少个字符串甚至有多少单词。

Something like;就像是;

DECLARE match_against as ARRAY;
SET match_against = SELECT lower(table2.b) FROM table2;
SELECT CASE WHEN lower(split(table1.a, ' ')[SAFE_ORDINAL(0)] IN match_against 
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(1)] IN match_against 
OR lower(split(table1.a, ' ')[SAFE_ORDINAL(2)] IN match_against 
THEN 1 ELSE 0 END 
FROM table1;

Especially if column a in table2 is containing strings with multiple words this becomes unworkable.特别是如果 table2 中的列 a 包含具有多个单词的字符串,这将变得不可行。 (check for any match between a word from the string in column a and any word in the strings from column b) (检查 a 列中字符串中的单词与 b 列中字符串中的任何单词之间的任何匹配)

any suggestions on how to achieve this?关于如何实现这一目标的任何建议?

Some example data: table1 column a values: 'abc', 'aBc dEf', 'abc def ghij' table2 column b values: 'def'一些示例数据:table1 列 a 值:'abc'、'aBc dEf'、'abc def ghij' table2 列 b 值:'def'

Expected result is 0,1,1预期结果是 0,1,1

A left join is a good way to comnbine two tables.左连接是组合两个表的好方法。 The CONTAINS_SUBSTR does not accept a column as seach parameter, therefore, we use a UDF: CONTAINS_SUBSTR不接受列作为 seach 参数,因此,我们使用 UDF:

create temp function contains_str(a string,b string ) returns bool language js as 
"""
return a.includes(b);
""";

with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"noabc,"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
contains_str(a,b)
 from tbl_a
left join tbl_b
on contains_str(concat(" ",lower(a)," "),concat(" ",lower(b)," "))

We add spaces around each search.我们在每个搜索周围添加空格。 Thus noabc is not matched by abc and cats not by cat .因此noabcabc不匹配,而catcats不匹配。

Each entry of table A is repeated for each match to table B.每次匹配到表 B 时,都会重复表 A 的每个条目。

Another way is following.另一种方法是跟随。 It only give the entries of table A which fullfill a match in table B, but it cannnot return additional columns from table B. The soundex helps to find matches.它只给出表 A 的条目,这些条目在表 B 中填充了一个匹配项,但它不能从表 B 返回其他列soundex有助于找到匹配项。 Please read documentation .请阅读文档

create temp function contains_str(a string,b string ) returns bool language js as 
"""
return a.includes(b);
""";

with tbl_a as (select a from unnest([ 'abc', 'aBc dEf', 'abc def ghij',"cats","cat"])a),
tbl_b as (select b from unnest([ 'dEf', 'abc',"cat"])b)
select *,
 from tbl_a
 where exists (select if(contains_str(concat(" ",soundex(a)," "),concat(" ",soundex(b)," ")),1,null) from tbl_b )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 BigQuery:以 UNION 作为 ARRAY 的子查询 - BigQuery: Subquery with UNION as ARRAY 如何使用 BIGQUERY 提取数组中每个项目的最后 8 个元素 - How to extract the last 8 elements of each item on an array using BIGQUERY 如何从给定日期开始将 BigQuery 数组的元素与日期匹配? - How to match elements of a BigQuery array with dates by starting from a given date? 如何使用 Google BigQuery 中数组类型列的不同元素进行分组? - How to Group By using the distinct elements of an Array type column in Google BigQuery? 如何修复 GENERATE_ARRAY() 产生过多元素 BigQuery - How to fix GENERATE_ARRAY() produced too many elements BigQuery Bigquery - json_extract 数组中的所有元素 - Bigquery - json_extract all elements from an array BigQuery:从 json 对象数组中提取选定键的值 - BigQuery: Extract values of selected keys from an array of json objects BigQuery 使用数组中的值获取和聚合数据 - BigQuery get and aggregate data joined using values from an array 在 bigquery 中,如何检查一个数组中的至少一个元素是否在另一个数组中? - In bigquery how can I check if at least one elemnt from one array is in another array? 如何通过在 bigquery sql 中进行分组字符串比较来返回同一列中字符串值的差异? - How to return difference in string values from the same column by doing a grouped string comparison in bigquery sql?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM