简体   繁体   English

为每一行选择随机值

[英]Select random value for each row

I'm trying to select a new random value from a column in another table for each row of a table I'm updating. 我正在尝试为要更新的表的每一行从另一个表的列中选择一个新的随机值。 I'm getting the random value, however I can't get it to change for each row. 我正在获取随机值,但是无法为每一行更改它。 Any ideas? 有任何想法吗? Here's the code: 这是代码:

UPDATE srs1.courseedition
SET ta_id = teacherassistant.ta_id
FROM srs1.teacherassistant
WHERE (SELECT ta_id FROM srs1.teacherassistant ORDER BY RANDOM()
       LIMIT 1) = teacherassistant.ta_id

My guess is that Postgres is optimizing out the subquery, because it has no dependencies on the outer query. 我的猜测是Postgres正在优化子查询,因为它不依赖于外部查询。 Have you simply considered using a subquery? 您是否曾经考虑过使用子查询?

UPDATE srs1.courseedition
    SET ta_id = (SELECT ta.ta_id
                 FROM srs1.teacherassistant ta
                 ORDER BY RANDOM()
                 LIMIT 1
                );

I don't think this will fix the problem (smart optimizers, alas). 我认为这不会解决问题(智能优化器,a)。 But, if you correlate to the outer query, then it should run each time. 但是,如果您与外部查询相关联,则它应该每次都运行。 Perhaps: 也许:

UPDATE srs1.courseedition ce
    SET ta_id = (SELECT ta.ta_id
                 FROM srs1.teacherassistant ta
                 WHERE ce.ta_id IS NULL  -- or something like that
                 ORDER BY RANDOM()
                 LIMIT 1
                );

You can replace the WHERE clause with something more nonsensical such as WHERE COALESCE(ca.ta_id, '') IS NOT NULL . 您可以将WHERE子句替换为更荒谬的内容,例如WHERE COALESCE(ca.ta_id, '') IS NOT NULL

This following solution should be faster by order(s) of magnitude than running a correlated subquery for every row. 与对每行运行一个相关子查询相比,以下解决方案应该一个数量级 N random sorts over the whole table vs. 1 random sort. 整个表格中的N个随机排序与1个随机排序相比。 The result is just as random, but we get a perfectly even distribution with this method, whereas independent random picks like in Gordon's solution can (and probably will) assign some rows more often than others. 结果是一样随机的,但是通过这种方法我们得到了一个完美的均匀分布,而像Gordon解决方案中那样的独立随机选择可以(并且可能会)分配比其他行更多的行。 There are different kinds of "random". 有不同种类的“随机”。 Actual requirements for "randomness" need to be defined carefully. “随机性”的实际要求需要仔细定义。

Assuming the number of rows in courseedition is bigger than in teacherassistant . 假设courseedition的行数大于teacherassistant的行数。

To update all rows in courseedition : 要更新courseedition 所有行:

UPDATE srs1.courseedition c1
SET    ta_id = t.ta_id
FROM  (
   SELECT row_number() OVER (ORDER BY random()) - 1 AS rn  -- random order
        , count(*) OVER () As ct                           -- total count
        , ta_id
   FROM   srs1.teacherassistant           -- smaller table
   ) t
JOIN (
   SELECT row_number() OVER () - 1 AS rn  -- arbitrary order
        , courseedition_id                -- use actual PK of courseedition
   FROM   srs1.courseedition              -- bigger table
   ) c ON c.rn%t.ct = t.rn                -- rownumber of big modulo count of small table
WHERE  c.courseedition_id = c1.courseedition_id;

Notes 笔记

Match the random rownumber of the bigger table modulo the count of the smaller table to the rownumber of the smaller table. 将较大表的随机行号与较小表的行数取模,以匹配较小表的行号。

row_number() - 1 to get a 0-based index. row_number() - 1以获取从0开始的索引。 Allows using the modulo operator % more elegantly. 允许更优雅地使用模运算符%

Random sort for one table is enough. 一个表的随机排序就足够了。 The smaller table is cheaper. 较小的桌子更便宜。 The second can have any order (arbitrary is cheaper). 第二个可以有任何顺序(任意便宜)。 The assignment after the join is random either way. 加入后的分配是任意方式的。 Perfect randomness would only be impaired indirectly if there are regular patterns in sort order of the bigger table. 只有在较大表的排序顺序中存在规则模式时,才会完全破坏完全随机性。 In this unlikely case, apply ORDER BY random() to the bigger table to eliminate any such effect. 在这种不太可能的情况下,将ORDER BY random()应用于更大的表以消除任何此类影响。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM