I'm trying to query a string repeated field using NOT contain (regex)
This is the query, where nickname is an array (repeated) of strings:
SELECT
name
FROM
[mytable]
WHERE
(NOT REGEXP_MATCH (nickname, '(query)'))
The problem is when users have at least two values under nickname, they will be returned if I query using NOT
For: NOT REGEXP_MATCH (nickname, '(jonny)')
name nickname
john [johhny,jonny]
jon [jonny]
will return john and it shouldn't be.
It's easier to express this kind of logic using NOT EXISTS
or an ARRAY
subquery with standard SQL . For example,
#standardSQL
WITH Names AS (
SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT
name
FROM Names
WHERE NOT EXISTS (
SELECT 1 FROM UNNEST(nicknames) AS nickname
WHERE nickname LIKE '%johnny%'
);
As another example, you may want to include just the nicknames not matching the substring:
#standardSQL
WITH Names AS (
SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT *
FROM (
SELECT
name,
ARRAY(SELECT nickname FROM UNNEST(nicknames) AS nickname
WHERE nickname NOT LIKE '%johnny%') AS nicknames
FROM Names
)
WHERE ARRAY_LENGTH(nicknames) > 0;
In case if you are still bound to BigQuery Legacy SQL, below is respective solution
#legacySQL
SELECT name FROM (
SELECT
name, SUM(nicknames LIKE '%johnny%') WITHIN RECORD AS matches
FROM [mytable]
)
WHERE matches = 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.