简体   繁体   中英

Query repeated fields negation NOT include

I'm trying to query a string repeated field using NOT contain (regex)

This is the query, where nickname is an array (repeated) of strings:

 SELECT
    name
  FROM
    [mytable]
  WHERE
     (NOT  REGEXP_MATCH (nickname, '(query)'))

The problem is when users have at least two values under nickname, they will be returned if I query using NOT

For: NOT REGEXP_MATCH (nickname, '(jonny)')

name     nickname 

john    [johhny,jonny]
jon     [jonny]

will return john and it shouldn't be.

It's easier to express this kind of logic using NOT EXISTS or an ARRAY subquery with standard SQL . For example,

#standardSQL
WITH Names AS (
  SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
  SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT
  name
FROM Names
WHERE NOT EXISTS (
  SELECT 1 FROM UNNEST(nicknames) AS nickname
  WHERE nickname LIKE '%johnny%'
);

As another example, you may want to include just the nicknames not matching the substring:

#standardSQL
WITH Names AS (
  SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
  SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT *
FROM (
  SELECT
    name,
    ARRAY(SELECT nickname FROM UNNEST(nicknames) AS nickname
          WHERE nickname NOT LIKE '%johnny%') AS nicknames
  FROM Names
)
WHERE ARRAY_LENGTH(nicknames) > 0;

In case if you are still bound to BigQuery Legacy SQL, below is respective solution

#legacySQL
SELECT name FROM (
  SELECT
    name, SUM(nicknames LIKE '%johnny%') WITHIN RECORD AS matches
  FROM [mytable]
)
WHERE matches = 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM