简体   繁体   English

如何将随机值插入SQL Server表?

[英]How can I insert random values into a SQL Server table?

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. 我正在尝试将预定义值列表中的值随机插入到表中进行测试。 I tried using the solution found on this StackOverflow question: 我尝试使用此StackOverflow问题中找到的解决方案:

stackoverflow.com/.../update-sql-table-with-random-value-from-other-table

When II tried this, all of my "random" values that are inserted are exactly the same for all 3000 records. 当II尝试这个时,所有插入的“随机”值对于所有3000条记录都是完全相同的。

When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. 当我运行实际选择随机行的查询部分时,每次我手动运行时都会选择一个随机记录,所以我知道查询有效。 My best guesses as to what is happening are: 我最好的猜测是发生了什么:

  • SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once SQL Server以某种方式优化SELECT ,不允许对子查询进行多次评估
  • The random value's seed is the same on every record the query updates 随机值的种子在查询更新的每条记录上都是相同的

I'm stuck on what my options are. 我坚持我的选择。 Am I doing something wrong, or is there another way I should be doing this? 我做错了什么,或者我还有另一种方法吗?

This is the code I'm using: 这是我正在使用的代码:

DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))

INSERT INTO @randomStuff ([id], [val]) 
VALUES ( 1,  'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2,  'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3,  'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4,  'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5,  'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

When the query engine sees this... 当查询引擎看到这个......

(SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!" ......这就像是,“哦,一个可缓存的标量子查询,我要缓存它!”

You need to trick the query engine into thinking it's non-cachable. 你需要欺骗查询引擎认为它是不可缓存的。 jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn , but it ain't smart enough to see through this. jfar的答案很接近,但是查询引擎非常聪明,可以看到MyTable.MyColumn = MyTable.MyColumn ,但是它看起来不够聪明。

UPDATE MyTable
   SET MyColumn = (SELECT TOP 1 val
                     FROM @randomStuff r
                          INNER JOIN MyTable _MT
                                  ON M.Id = _MT.Id
                    ORDER BY NEWID())
 FROM MyTable M

By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. 通过将外部表(MT)引入子查询,查询引擎假定需要重新评估子查询。 Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead. 任何东西都可以正常工作,但我选择了MyTable.Id的(假定的)主键,因为它被编入索引并且会增加很少的开销。

A cursor would probably be just as fast, but is most certainly not as fun. 光标可能同样快,但肯定不是那么有趣。

使用交叉连接生成随机数据

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable. 我玩过这个游戏,发现使用中间表变量做一个相当hacky的方法。

Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table): 一旦设置了@randomStuff,我们就这样做了(注意在我的情况下,@ MyTable是一个表变量,相应地调整你的普通表):

DECLARE @randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)

INSERT INTO @randomMappings 
SELECT M.id, val, NEWID() AS sort 
FROM @MyTable AS M 
CROSS JOIN @randomstuff

so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. 所以在这一点上,我们有一个中间表,其中包含(mytable id,random value)的每个组合,以及特定于该组合的每一行的随机排序值。 Then 然后

DELETE others FROM @randomMappings AS others 
INNER JOIN @randomMappings AS lower 
ON (lower.id = others.id) AND (lower.sorter < others.sorter)

This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. 这是一个旧技巧,删除给定MyTable.id的所有行,除了具有较低排序值的那个 - 将表连接到值较小的自身,并删除任何此类连接成功的位置。 This just leaves behind the lowest value. 这只留下了最低价值。 So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table: 因此,对于每个MyTable.id,我们只剩下一个(随机)值。然后我们将其重新插入表中:

UPDATE @MyTable
SET MyColumn = random.val
FROM @MyTable m, @randomMappings AS random
WHERE (random.id = m.id)

And you're done! 而且你已经完成了!

I said it was hacky... 这是hacky ......

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out. 我现在没时间检查这个,但我的直觉告诉我,如果你要在服务器上创建一个函数来获取随机值,它就不会优化它。

then you would have 那么你会的

UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()

There is no optimization going on here. 这里没有优化。

Your using a subquery that selects a single value, there is nothing to optimize. 您使用选择单个值的子查询,无需优化。

You can also try putting a column from the table your updating in the select and see if that changes anything. 您还可以尝试从更新的表中选择一个列,然后查看是否有任何更改。 That may trigger an evaluation for every row in MyTable 这可能会触发对MyTable中每一行的评估

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID()
    WHERE MyTable.MyColumn = MyTable.MyColumn )

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). 我提出了一个解决方案,这是一个有点黑客和非常低效(更新3000条记录10秒)。 Because this is being used to generate test data, I don't have to be concerned about speed however. 因为这用于生成测试数据,所以我不必关心速度。

In this solution, I iterate over every row in the table and update the values one row at a time. 在此解决方案中,我迭代表中的每一行并一次更新一行值。 It seems to work: 它似乎工作:

DECLARE @rows INT 
DECLARE @currentRow INT

SELECT @rows = COUNT(*) FROM dbo.MyTable
SET @currentRow = 1

WHILE @currentRow < @rows
BEGIN 

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
 FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
      FROM MyTable a) AS b
 WHERE @currentRow = b.rownumber
)

SET @currentRow = @currentRow + 1
END 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM