PostgreSQL 中的快速隨機行：為什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

Question

我需要從 PostgreSQL 查詢中快速選擇行。 我已經閱讀了選擇隨機行 PostgreSQL 的最佳方法。 Postgres 中的快速隨機行選擇

到目前為止，我讀過的最快的是：

CREATE EXTENSION IF NOT EXISTS tsm_system_rows;
SELECT myid  FROM mytable TABLESAMPLE SYSTEM_ROWS(1);

平均 2 毫秒。 但正如評論中所指出的，它不是“完全隨機的”。

我試過了

SELECT id FROM a OFFSET floor(random()*3000000) LIMIT 1;

15-200 毫秒。

最簡單的想法是按 id 選擇，因為我的 id 是連續的。 但

select floor(random ()*1000); 2ms
select * from a where id=233; 2ms (and again 2ms for other constants)

但

SELECT * FROM a where id = floor(random ()*1000)::integer; 300ms!!!

為什么是 300 而不是 4？ 是否可以以某種方式重新排序、提示等以達到 4 毫秒？

Answer 1

這是因為random()被定義為 volatile，所以 Postgres 再次為每一行評估它——有效地遍歷所有行。

如果您想阻止這種情況，請將其“隱藏”在（否則無用的）子選擇后面：

SELECT * 
FROM a 
where id = (select trunc(random ()*1000)::integer);

Answer 2

以下內容嚴格適用於@a-horse-with_no-name 回答后的 OP 問題：奇怪的是，它變得很長，沒有 ::integer。 這是為什么？

因為 ::integer 是 SQL 標准“select cast(number as integer)”的 Postgres 擴展，所以 RANDOM() 返回的類型是雙精度的，並且在應用 TRUNC() 函數后仍然如此。 顯示的內容由您的系統決定。

在其一般形式中，結構 val::data_type 表示將 val 轉換為指定的 data_type（提供有效的轉換函數存在）。 如果 val 本身是一個表達式，則格式變為 (val)::data_type。
下面分步展示了 a-horse-with-no-name 的查詢正在執行的操作，並指示該步驟的數據類型。 CTE 是嚴格的，因此每個步驟使用相同的值，因為每次使用 random() 會生成不同的值。

with gen  as (select random() n)
select  n,pg_typeof(n)                          --step1 get random value interval [0-1). 
     ,  n*1000, pg_typeof(n*1000)               -- get value into interval [0-999.9999...)  
     ,  trunc(n*1000), pg_typeof(trunc(n*1000)) -- reduce to interval [0,999.000)
     ,  trunc(n*1000)::integer, pg_typeof(trunc(n*1000)::integer) 
  from gen;                                     -- cast to integer interval [0-999)

順便說一句，上面並不嚴格需要 trunc() 函數，因為將 double 轉換為整數會丟棄任何十進制數字。

我希望這可以幫助您了解正在發生的事情。

PostgreSQL 中的快速隨機行：為什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

問題描述

2 個解決方案

解決方案1
2 已采納 2020-01-30 14:46:08

解決方案2
1 2020-01-30 22:06:06

PostgreSQL 中的快速隨機行：為什么 time (floor(random()*N)) + (select * from a where id = const) 比 select where id = random 少 100 倍？

問題描述

2 個解決方案

解決方案1 2 已采納 2020-01-30 14:46:08

解決方案2 1 2020-01-30 22:06:06

PostgreSQL 中的快速隨機行：為什么 time (floor(random()N)) + (select from a where id = const) 比 select where id = random 少 100 倍？

解決方案1
2 已采納 2020-01-30 14:46:08

解決方案2
1 2020-01-30 22:06:06