简体   繁体   中英

MSSQL Select Random in Large Data

I have a table that has more than 1 million records and I want to select random rows from this table, but not among all records - only select Random rows from results matching certain conditions.

Performance is very important , so I can NOT use ordering by NEWID and then select first item.

The table structure is some thing like this:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

Now, I wrote a query like:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

I tried to select 10 item randomly, and then select one of them, but I have an amazing problem!!!

Sometimes the output is ordered by item level! And I don't want it (it's not really random ). I really don't know how result was ordered by level.

Please suggest some solution that help me to select random record in high performance and random selected in high range of iteration is not duplicate.

Based from MSDN's Selecting Rows Randomly from a Large Table , instead of the one you avoid:

select top 10 * from TableName order by newid()

It suggests this:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

It has only much smaller logical read an much better performance.

Try something like this. It will randomly grab 10 rows from your table.

This is pseudo code, so you might need to fix a few column names to match your real tables.

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

Here is how it works.

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM