简体   繁体   中英

How to find most frequent pair in SQL?

I am trying to write a query in MySQL that will output the most frequently occurring pair of values. I have the following table:

Original Dataset

This table contains users' music streaming activity on a given day. I want to find out which pair of artists was the most frequently played one on a specific day. The answer should be (Pink Floyd, Queen) because 3 users listened to both artists on the same day. How can I achieve this?

I've started by joining the table onto itself using this code:

With temp as (
select person_id, artist_name, count(*) as times_played from users where date_played = '2020-10-01' group by 1,2)
select a.person_id, a.artist_name, b.artist_name from temp a join temp b
On a.person_id = b.person_id and a.artist_name != b. artist_name;

The result is the following :

I am not sure how to process from this point, so any help would be appreciated!

Below is the code to create the table in mySQL

create table users
(
  person_id       int,
  artist_name     varchar(255),
  date_played     date
);

insert into users
  (person_id, artist_name, date_played)
values
  (1, 'Pink Floyd', '2020-10-01'),
  (1, 'Led Zeppelin', '2020-10-01'),
  (1, 'Queen', '2020-10-01'),
  (1, 'Pink Floyd', '2020-10-01'),
  (2, 'Journey', '2020-10-01'),
  (2, 'Pink Floyd', '2020-10-01'),
  (2, 'Queen', '2020-10-01'),
  (2, 'Pink Floyd', '2020-10-01'),
  (3, 'Pink Floyd', '2020-10-01'),
  (3, 'Aerosmith', '2020-10-01'),
  (3, 'Queen', '2020-10-01'),
  (4, 'Pink Floyd', '2020-10-01'),
  (4, 'Led Zeppelin', '2020-10-01');

Here's how I solved my question thanks to the trick I found in the code provided by Tim Biegeleisen in this post ( u1.artist_name < u2.artist_name ):

With temp AS (
    SELECT 
        person_id, 
        artist_name 
    FROM users 
    WHERE date_played = '2020-10-01' 
    GROUP BY 1,2
)
SELECT * 
FROM (

SELECT
    u1.artist_name AS artist1,
    u2.artist_name AS artist2,
    COUNT(*) AS times_played,
    RANK() OVER (ORDER BY COUNT(*) DESC) Rnk
FROM temp u1
JOIN temp u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
GROUP by 1,2
) sub

WHERE Rnk = 1; 

We can try handling this requirement using a self join along with the RANK() analytic function:

WITH cte AS (
    SELECT
        u1.artist_name AS artist1,
        u2.artist_name AS artist2,
        RANK() OVER (ORDER BY COUNT(*) DESC) rnk
    FROM users u1
    INNER JOIN users u2
        ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
    WHERE
        u1.date_played = u2.date_played
    GROUP BY
        u1.artist_name,
        u2.artist_name
)

SELECT
    artist1,
    artist2
FROM cte
WHERE rnk = 1;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM