简体   繁体   English

MSSQL在大数据中选择随机

[英]MSSQL Select Random in Large Data

I have a table that has more than 1 million records and I want to select random rows from this table, but not among all records - only select Random rows from results matching certain conditions. 我有一个表,其中有超过100万条记录,我想从该表中选择随机行,而不是从所有记录中选择-仅从符合特定条件的结果中选择随机行。

Performance is very important , so I can NOT use ordering by NEWID and then select first item. 性能非常重要 ,因此我不能使用按NEWID排序,然后选择第一项。

The table structure is some thing like this: 表结构是这样的:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

Now, I wrote a query like: 现在,我编写了一个查询,例如:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

I tried to select 10 item randomly, and then select one of them, but I have an amazing problem!!! 我尝试随机选择10个项目,然后选择其中之一,但是我遇到了一个令人惊讶的问题!!!

Sometimes the output is ordered by item level! 有时,输出按项目级别排序! And I don't want it (it's not really random ). 而且我不想要它(这不是随机的)。 I really don't know how result was ordered by level. 我真的不知道结果如何按级别排序。

Please suggest some solution that help me to select random record in high performance and random selected in high range of iteration is not duplicate. 请提出一些解决方案,这些解决方案可以帮助我选择高性能的随机记录,而在高迭代范围内选择随机记录则不是重复的。

Based from MSDN's Selecting Rows Randomly from a Large Table , instead of the one you avoid: 基于MSDN的从大表中随机选择行 ,而不是避免的一种:

select top 10 * from TableName order by newid()

It suggests this: 它表明:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

It has only much smaller logical read an much better performance. 它只有很小的逻辑读,却有更好的性能。

Try something like this. 尝试这样的事情。 It will randomly grab 10 rows from your table. 它将从您的表中随机获取10行。

This is pseudo code, so you might need to fix a few column names to match your real tables. 这是伪代码,因此您可能需要修复一些列名以匹配实际表。

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

Here is how it works. 下面是它的工作原理。

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM