简体   繁体   中英

Unable to join using wildcards in BigQuery

I am trying to join two tables in big query, Table1 contains an ID column, and Table2 contains a column which has the same ID or multiple ID's in the form of a long string separated by commas, like "id123,id456,id678"

I can join the tables together if Table1.ID = Table2.ID but this ignores all the rows where Table1.ID is one of the multiple IDs in Table2.ID. I have looked at similar post that tell me to use wildcards like

on concat('%',Table1.ID,'%') = Table2.ID

but this does not work, because it seems to create a string that contains the '%' character and doesn't actually use it as a wildcard.

I'm using standard sql in BigQuery, any help would be appreciated

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT 123 id, 'a' test UNION ALL
  SELECT 456, 'b' UNION ALL
  SELECT 678, 'c'
), `project.dataset.table2` AS (
  SELECT 'id123,id456' id UNION ALL
  SELECT 'id678'
)
SELECT t2.id, test
FROM `project.dataset.table2` t2, UNNEST(SPLIT(id)) id2
JOIN `project.dataset.table1` t1
ON CONCAT('id', CAST(t1.id AS STRING)) = id2  

result is as below

Row id              test     
1   id123,id456     a    
2   id123,id456     b    
3   id678           c    

It is doubtful that you have values in the table that start and end with percentage signs. = does not recognize wildcards; like does:

on Table2.ID like concat('%', Table1.ID, '%')

As a warning. Such a construct is usually a performance killer. You would be better off trying to have columns in Table1 and Table2 that match exactly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM