简体   繁体   中英

SQL Server: select random rows with distinct id from table where id is not distinct

I have a simple table named Tickets with the following columns:

ticketId, userId 

where ticketId is the primary key, UserId is not unique.

A user can therefore have several tickets, each with unique ticketId 's.

I'm struggling to find a solution on my problem which is that I need to select 5 random tickets by 5 unique userId's.

I know how to select the random tickets by using the following query:

SELECT TOP 5 *
FROM Tickets
ORDER BY RAND(CHECKSUM(*) * RAND())

Which returns something like:

Ticket id:         UserId:
--------------------------
10                 1
25                 1
31                 2
42                 2
56                 3

My question is: what do I need to add to the query for it to select the random rows between distinct userId's so that it does not return more than one unique ticket for a user

Mind I need the most performance correct solution, since the table could potentially be filled with millions of rows in the long run.

Thanks in advance, Christian

Edit: The more tickets a user has, the higher the chances of selection. However it should still be randomly selected and not just select the user with the highest amount of tickets. Just like in a lottery.

In other words it should select 5 random rows between all rows, but ensure that the 5 rows have a unique userId.

Please try like this .... NEWID()

Select UserId
from 
(
    SELECT TOP 5 UserId 
    FROM Tickets
    ORDER BY NEWID()
)k 
CROSS APPLY 
( 
      select top 1 TicketId 
      from Tickets T WHERE T.UserId = k.UserId
      ORDER BY NEWID()
)u

Edit: As pointed out in the comments, this solution doesn't properly weight the users by number of tickets (so a user with 1000 tickets incorrectly has same change of winning as user with 1 ticket). This was particularly dumb of me since I pointed out this problem on other answers.

Given that Steve now has his solution working, I think that is the better answer.

Original answer:

I think something like the following works:

SELECT top 5 ticketid, userid
FROM
   (
     SELECT ticketid, userid, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY NEWID()) as nid
     FROM tickets
    ) a
WHERE  nid = 1
ORDER BY NEWID()

Here's an sql fiddle to play around with it.

Credit where credit is due: I based this on Steve 's solution which I don't think works correctly as written.

Something like the following I think.

Please note this code is untested, so please excuse any small syntax errors.

WITH randomised_tickets AS
(
    SELECT 
        *
        ,ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS random_order

    FROM Tickets
)

,ordered_winning_tickets AS
(
    SELECT
        *
        ,ROW_NUMBER() OVER (PARTITION BY userId ORDER BY random_order ASC) AS user_win_order

    FROM randomised_tickets
)

SELECT TOP 5
    *

FROM 
    ordered_winning_tickets

WHERE
    user_win_order = 1 --eliminate 2nd wins from the list

ORDER BY
    random_order

You could try something like this, using ignore_dup_key on a temp table to eliminate duplciates for a user:

 drop table if exists #WinningTickets
 create table #WinningTickets(PickId int identity primary key, TicketId int, UserId int)
 create unique index ix_unique_user on #WinningTickets(UserId) with (ignore_dup_key=on)

 while ( select count(*) from #WinningTickets ) < 5
 begin
   insert into #WinningTickets
   select top 10 TicketId, UserId
   from Tickets
   order by newid()
 end

 select top 5 * 
 from #WinningTickets
 order by PickId

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM