简体   繁体   中英

Count matches between multiple columns and words in a nested array

My earlier question was resolved. Now I need to develop a related, but more complex query.

I have a table like this:

id     description          additional_info
-------------------------------------------
123    games                XYD
124    Festivals sport      swim

And I need to count matches to arrays like this:

array_content varchar[] := {"Festivals,games","sport,swim"}

If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting of multiple words) can only contribute 1 to the total count.

The result for the above example should be:

id    RID    Matches
1     123    1
2     124    2

The answer isn't simple, but figuring out what you are asking was harder:

SELECT row_number() OVER (ORDER BY t.id) AS id
     , t.id AS "RID"
     , count(DISTINCT a.ord) AS "Matches"
FROM   tbl t
LEFT   JOIN (
   unnest(array_content) WITH ORDINALITY x(elem, ord)
   CROSS JOIN LATERAL
   unnest(string_to_array(elem, ',')) txt
   ) a ON t.description ~ a.txt
       OR t.additional_info ~ a.txt
GROUP  BY t.id;

Produces your desired result exactly.
array_content is your array of search terms.

How does this work?

Each array element of the outer array in your search term is a comma-separated list. Decompose the odd construct by unnesting twice (after transforming each element of the outer array into another array). Example:

SELECT *
FROM   unnest('{"Festivals,games","sport,swim"}'::varchar[]) WITH ORDINALITY x(elem, ord)
CROSS  JOIN LATERAL
       unnest(string_to_array(elem, ',')) txt;

Result:

 elem            | ord |  txt
-----------------+-----+------------
 Festivals,games | 1   | Festivals
 Festivals,games | 1   | games
 sport,swim      | 2   | sport
 sport,swim      | 2   | swim

Since you want to count matches for each outer array element once , we generate a unique number on the fly with WITH ORDINALITY . Details:

Now we can LEFT JOIN to this derived table on the condition of a desired match:

   ... ON t.description ~ a.txt
       OR t.additional_info ~ a.txt

.. and get the count with count(DISTINCT a.ord) , counting each array only once even if multiple search terms match.

Finally, I added the mysterious id in your result with row_number() OVER (ORDER BY t.id) AS id - assuming it's supposed to be a serial number. Voilá.

The same considerations for regular expression matches ( ~ ) as in your previous question apply:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM