繁体   English   中英

MSSQL在大数据中选择随机

[英]MSSQL Select Random in Large Data

我有一个表,其中有超过100万条记录,我想从该表中选择随机行,而不是从所有记录中选择-仅从符合特定条件的结果中选择随机行。

性能非常重要 ,因此我不能使用按NEWID排序,然后选择第一项。

表结构是这样的:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

现在,我编写了一个查询,例如:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

我尝试随机选择10个项目,然后选择其中之一,但是我遇到了一个令人惊讶的问题!!!

有时,输出按项目级别排序! 而且我不想要它(这不是随机的)。 我真的不知道结果如何按级别排序。

请提出一些解决方案,这些解决方案可以帮助我选择高性能的随机记录,而在高迭代范围内选择随机记录则不是重复的。

基于MSDN的从大表中随机选择行 ,而不是避免的一种:

select top 10 * from TableName order by newid()

它表明:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

它只有很小的逻辑读,却有更好的性能。

尝试这样的事情。 它将从您的表中随机获取10行。

这是伪代码,因此您可能需要修复一些列名以匹配实际表。

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

下面是它的工作原理。

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM