简体   繁体   中英

MODE aggregation function with redshift

I have the following tables:

customers

customer_id name
----------------
1           bob
2           alice
3           tim

purchases

id customer_id item_bought
--------------------------
1  1           hat
2  1           shoes
3  2           glasses
3  2           glasses
4  2           book
5  3           shoes
6  1           hat

And I want to following result:

customer_name item_bought_most_often
------------------------------------
bob           hat
alice         glasses
tim           shoes

I would do this like this (not actually tried, just the idea):

SELECT customer.name as customer_name,
  MODE(item_bought) as item_bought_most_ofen
FROM customers
INNER JOIN purchases USING (customer_id)
GROUP_BY customer_id

However, the MODE aggregation function doesn't exist in Redshift.

It seems that Redshift user defined functions are just regular scalar functions, not aggregations functions. So I don't think I can define it myself.

Any workaround ?

You can mimic mode() by using row_number() :

select name, item_bought
from (select c.name, p.item_bought, count(*) as cnt,
             row_number() over (partition by c.name order by count(*) desc) as seqnum
      from customers c join
           purchases p
           using (customer_id)
      group by c.name, p.item_bought
     ) cp
where seqnum = 1;

You could first COUNT each person purchases and then use RANK() windowed function:

SELECT name AS customer_name, item_bought AS item_bought_most_often
FROM(SELECT name,item_bought,RANK() OVER(PARTITION BY name ORDER BY cnt DESC) rnk
     FROM (SELECT c.name, p.item_bought, COUNT(*) AS cnt
           FROM customers c
           JOIN purchases p
             ON p.customer_id = c.customer_id
           GROUP BY c.name, p.item_bought) AS s1) AS s2
WHERE rnk = 1;

LiveDemo

Output:

╔═══════════════╦════════════════════════╗
║ customer_name ║ item_bought_most_often ║
╠═══════════════╬════════════════════════╣
║ alice         ║ glasses                ║
║ bob           ║ hat                    ║
║ tim           ║ shoes                  ║
║ zoe           ║ pencil                 ║
║ zoe           ║ book                   ║
╚═══════════════╩════════════════════════╝

Note:

RANK will handle multiple most common values.


EDIT:

As Lukas Eder mentioned it could be further simplified:

SELECT name AS customer_name, item_bought AS  item_bought_most_often
FROM(SELECT name,item_bought,
            RANK() OVER(PARTITION BY name ORDER BY COUNT(*) DESC) rnk
     FROM customers c
     JOIN purchases p
       ON p.customer_id = c.customer_id
     GROUP BY c.name, p.item_bought)  AS s2
WHERE rnk = 1;

db<>fiddle demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM