I have two tables available in BigQuery:
my-project.my-database.what-to-query
: +---------+-----------+
| id_what | name_what |
+---------+-----------+
| 1 | C++ |
+---------+-----------+
| 2 | Foo |
+---------+-----------+
| 3 | Ca$h |
+---------+-----------+
my-project.my-database.where-to-query
: +----------+----------------------+
| id_where | name_where |
+----------+----------------------+
| 4 | C++ and Ca$h |
+----------+----------------------+
| 5 | Foo Fighters is nice |
+----------+----------------------+
| 6 | I know C# and C++ |
+----------+----------------------+
| 7 | Football is cool |
+----------+----------------------+
| 8 | Don't have anything |
+----------+----------------------+
I would like to use name_what
as a REGEX search keyword, to obtain all the matches in name_where
, while keeping all the columns. The result should look like:
+---------+-----------+----------+----------------------+
| id_what | name_what | id_where | name_where |
+---------+-----------+----------+----------------------+
| 1 | C++ | 4 | C++ and Ca$h |
+---------+-----------+----------+----------------------+
| 1 | C++ | 6 | I know C# and C++ |
+---------+-----------+----------+----------------------+
| 2 | Foo | 5 | Foo Fighters is nice |
+---------+-----------+----------+----------------------+
| 2 | Foo | 7 | Football is cool |
+---------+-----------+----------+----------------------+
| 3 | Ca$h | 4 | C++ and Ca$h |
+---------+-----------+----------+----------------------+
Notice how C++
should be escaped, something like:
SELECT *
FROM `my-project.my-database.where-to-query`
WHERE REGEXP_CONTAINS(name, r"C\+\+")
BUT the thing is that column name_what
could keep several OTHER strings (ie, IRL, both tables contain hundreds of thousands of rows, this is only a toy sample), which would contain OTHER RegEx special characters. In Python for instance, you have re.escape to deal with this specific problem, but nothing similar in SQL / BigQuery.
With comment's aid, I have tried the following updated code:
CREATE TEMP FUNCTION ENCODE_WITH_ESCAPE(x STRING) RETURNS STRING AS (
REPLACE(
REPLACE(x, "+", "\\\\+"), "$", "\\\\$"
) -- For the time being, only "+" & "$" have been dealt with, there could be more
);
WITH what AS (
SELECT 1 AS id_what, 'c++' AS name_what UNION ALL
SELECT 2 AS id_what, 'foo' AS name_what UNION ALL
SELECT 3 AS id_what, 'ca$h' AS name_what
),
andwhere AS (
SELECT 4 AS id_where, 'C++ and Ca$h' AS name_where UNION ALL
SELECT 5 AS id_where, 'Foo Fighters is nice' AS name_where UNION ALL
SELECT 6 AS id_where, 'I know C# and C++' AS name_where UNION ALL
SELECT 7 AS id_where, 'Football is cool' AS name_where UNION ALL
SELECT 8 AS id_where, "Don't have anything" AS name_where
)
SELECT *
FROM what JOIN andwhere
ON REGEXP_CONTAINS(ENCODE_WITH_ESCAPE(andwhere.name_where), ENCODE_WITH_ESCAPE(what.name_what))
The previous code run, with the output: There is no data to display
.
How to combine all the requirements?
PS.: BigQuery's "Legacy SQL" can NOT be an answer.
See if this helps:
create temp function encode_with_escape(x STRING) returns string as (
replace(x, "+", "\\\\+")
);
WITH what AS (
SELECT 1 as id_what, 'c++' as name_what union all
SELECT 2 as id_what, 'foo' as name_what
),
andwhere as (
SELECT 3 as id_where, 'c++ is great' as name_where union all
SELECT 5 as id_where, 'c++ was after c' as name_where union all
SELECT 4 as id_where, 'food was good' as name_where
)
SELECT *
FROM what join andwhere
on regexp_contains(encode_with_escape(andwhere.name_where), encode_with_escape(what.name_what))
Gives back:
Consider below option
create temp function escapeRegExp(x string)
returns string language js
as r"return x.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');";
with what as (
select 1 as id_what, 'c++' as name_what union all
select 2 as id_what, 'foo' as name_what union all
select 3 as id_what, 'ca$h' as name_what
), andwhere as (
select 4 as id_where, 'C++ and Ca$h' as name_where union all
select 5 as id_where, 'Foo Fighters is nice' as name_where union all
select 6 as id_where, 'I know C# and C++' as name_where union all
select 7 as id_where, 'Football is cool' as name_where union all
select 8 as id_where, "Don't have anything" as name_where
)
select *
from what join andwhere
on regexp_contains(lower(name_where), escapeRegExp(lower(name_what)))
with output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.