用 100 萬條記錄填充表

Question

我有一桌person 。

CREATE TABLE PERSON(
    ID           NUMBER GENERATED BY DEFAULT AS IDENTITY,
    first_name    VARCHAR2(50),
    last_name     VARCHAR2(50),
    birth_date    DATE,
    gender        CHAR(10),
    salary        NUMBER(10, 2),
    CONSTRAINT PERSON_PK PRIMARY KEY (ID)
    );

我需要用 100 萬條記錄填充PERSON表。 這些列應使用遵循以下參數的隨機值填充：

- "first_name" should be populated with a random name from the list of 50 names provided below:
    | Aiden         | Anika         | Ariya         | Ashanti       | Avery         |
    | Cameron       | Ceri          | Che           | Danica        | Darcy         |
    | Dion          | Eman          | Eren          | Esme          | Frankie       |
    | Gurdeep       | Haiden        | Indi          | Isa           | Jaskaran      |
    | Jaya          | Jo            | Jodie         | Kacey         | Kameron       |
    | Kayden        | Keeley        | Kenzie        | Lucca         | Macauley      |
    | Manraj        | Nur           | Oluwatobiloba | Reiss         | Riley         |
    | Rima          | Ronnie        | Ryley         | Sam           | Sana          |
    | Shola         | Sierra        | Tamika        | Taran         | Teagan        |
    | Tia           | Tiegan        | Virginia      | Zhane         | Zion          |
- "last_name" should be populated with a random name from the list of 50 names provided below:
    | Ahmad         | Andersen      | Arias         | Barlow        | Beck          |
    | Bloggs        | Bowes         | Buck          | Burris        | Cano          |
    | Chaney        | Coombes       | Correa        | Coulson       | Craig         |
    | Frye          | Hackett       | Hale          | Huber         | Hyde          |
    | Irving        | Joyce         | Kelley        | Kim           | Larson        |
    | Lynn          | Markham       | Mejia         | Miranda       | Neal          |
    | Newton        | Novak         | Ochoa         | Pate          | Paterson      |
    | Pennington    | Rubio         | Santana       | Schaefer      | Schofield     |
    | Shaffer       | Sweeney       | Talley        | Trevino       | Tucker        |
    | Velazquez     | Vu            | Wagner        | Walton        | Woodward      |        
- duplicate combinations of "first_name" and "last_name" are allowed    
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table

請與我分享查詢

Answer 1

如果您真的不關心確切的名稱，那么您可以執行以下操作：

select  Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as first_name,
        Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as last_name,
        to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','dd-mm-yyyy'),'J'))),'J') as birth_date,
        trunc(dbms_random.value(1,100000)) as sal,
        case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from    dual connect by level <= 1000000 --Change here to whatever you want

示例輸出（顯然不是全部，只有前幾個）：

Answer 2

注意：我將專注於性能，即如何盡快生成 100 萬個值：

第一部分：我將展示如何非常快速地生成隨機名稱：

with function get_first_name(N in int) return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual 
connect by level<=10;

如您所見，我使用了內聯 PL/SQL 函數以及填充了名稱的關聯數組。 關聯數組是從列表中獲取值的最快方法。 內聯 PL/SQL 函數的工作速度比通常的 PL/SQL 函數快得多（即使它們是用PRAGMA UDF聲明的）。 DBMS_RANDOM.VALUE 生成 1 到 50 之間的隨機數。這里的 DBMS_RANDOM 是最慢的函數。

SO最終解決方案：

insert/*+ with_plsql */  into person(first_name,last_name,birth_date,gender,salary)
with 
-- functions:
function get_first_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;

function get_last_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Ahmad     ', 11 => 'Andersen', 21 =>'Arias  ', 31 => 'Barlow  ', 41 =>'Beck     ',
     2 => 'Bloggs    ', 12 => 'Bowes   ', 22 =>'Buck   ', 32 => 'Burris  ', 42 =>'Cano     ',
     3 => 'Chaney    ', 13 => 'Coombes ', 23 =>'Correa ', 33 => 'Coulson ', 43 =>'Craig    ',
     4 => 'Frye      ', 14 => 'Hackett ', 24 =>'Hale   ', 34 => 'Huber   ', 44 =>'Hyde     ',
     5 => 'Irving    ', 15 => 'Joyce   ', 25 =>'Kelley ', 35 => 'Kim     ', 45 =>'Larson   ',
     6 => 'Lynn      ', 16 => 'Markham ', 26 =>'Mejia  ', 36 => 'Miranda ', 46 =>'Neal     ',
     7 => 'Newton    ', 17 => 'Novak   ', 27 =>'Ochoa  ', 37 => 'Pate    ', 47 =>'Paterson ',
     8 => 'Pennington', 18 => 'Rubio   ', 28 =>'Santana', 38 => 'Schaefer', 48 =>'Schofield',
     9 => 'Shaffer   ', 19 => 'Sweeney ', 29 =>'Talley ', 39 => 'Trevino ', 49 =>'Tucker   ',
     10=> 'Velazquez ', 20 => 'Vu      ', 30 =>'Wagner ', 40 => 'Walton  ', 50 =>'Woodward '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;

  -- inline views:
  t1000(x) as (select level from dual connect by level<=1000)

-- main part:
select 
   get_first_name() first_name
  ,get_last_name () last_name
  ,date'1970-01-01' + dbms_random.value(0, date'2070-12-31'-date'1970-01-01') as birth_date
  ,decode(round(dbms_random.value()),0, 'MALE', 'FEMALE') gender
  ,dbms_random.value(1.00, 100000.00) as salary
from t1000, t1000;

此查詢查詢使用預生成的 CTE t1000 使其更快（您可以在 Jonathan Lewis 的文章中閱讀相關內容）。 此解決方案中最慢的部分是 sql 中的序列生成和 dbms_random。 DBMS_RANDOM 是 PL/SQL 函數，需要上下文切換。

附注。 我已經在此處發布了一些如何加快獲取隨機表行的示例： https : //stackoverflow.com/a/62892390/429100

用 100 萬條記錄填充表

問題描述

2 個解決方案

解決方案1
0 2020-09-17 10:49:57

解決方案2
0 2020-09-17 10:59:16

用 100 萬條記錄填充表

問題描述

2 個解決方案

解決方案1 0 2020-09-17 10:49:57

解決方案2 0 2020-09-17 10:59:16

解決方案1
0 2020-09-17 10:49:57

解決方案2
0 2020-09-17 10:59:16