[英]Populate table with 1 million records
我有一桌person
。
CREATE TABLE PERSON(
ID NUMBER GENERATED BY DEFAULT AS IDENTITY,
first_name VARCHAR2(50),
last_name VARCHAR2(50),
birth_date DATE,
gender CHAR(10),
salary NUMBER(10, 2),
CONSTRAINT PERSON_PK PRIMARY KEY (ID)
);
我需要用 100 萬條記錄填充PERSON
表。 這些列應使用遵循以下參數的隨機值填充:
- "first_name" should be populated with a random name from the list of 50 names provided below:
| Aiden | Anika | Ariya | Ashanti | Avery |
| Cameron | Ceri | Che | Danica | Darcy |
| Dion | Eman | Eren | Esme | Frankie |
| Gurdeep | Haiden | Indi | Isa | Jaskaran |
| Jaya | Jo | Jodie | Kacey | Kameron |
| Kayden | Keeley | Kenzie | Lucca | Macauley |
| Manraj | Nur | Oluwatobiloba | Reiss | Riley |
| Rima | Ronnie | Ryley | Sam | Sana |
| Shola | Sierra | Tamika | Taran | Teagan |
| Tia | Tiegan | Virginia | Zhane | Zion |
- "last_name" should be populated with a random name from the list of 50 names provided below:
| Ahmad | Andersen | Arias | Barlow | Beck |
| Bloggs | Bowes | Buck | Burris | Cano |
| Chaney | Coombes | Correa | Coulson | Craig |
| Frye | Hackett | Hale | Huber | Hyde |
| Irving | Joyce | Kelley | Kim | Larson |
| Lynn | Markham | Mejia | Miranda | Neal |
| Newton | Novak | Ochoa | Pate | Paterson |
| Pennington | Rubio | Santana | Schaefer | Schofield |
| Shaffer | Sweeney | Talley | Trevino | Tucker |
| Velazquez | Vu | Wagner | Walton | Woodward |
- duplicate combinations of "first_name" and "last_name" are allowed
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table
請與我分享查詢
如果您真的不關心確切的名稱,那么您可以執行以下操作:
select Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as first_name,
Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as last_name,
to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','dd-mm-yyyy'),'J'))),'J') as birth_date,
trunc(dbms_random.value(1,100000)) as sal,
case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from dual connect by level <= 1000000 --Change here to whatever you want
示例輸出(顯然不是全部,只有前幾個):
注意:我將專注於性能,即如何盡快生成 100 萬個值:
第一部分:我將展示如何非常快速地生成隨機名稱:
with function get_first_name(N in int) return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Aiden ', 11 => 'Anika ', 21 =>'Ariya ', 31 => 'Ashanti', 41 =>'Avery ',
2 => 'Cameron', 12 => 'Ceri ', 22 =>'Che ', 32 => 'Danica ', 42 =>'Darcy ',
3 => 'Dion ', 13 => 'Eman ', 23 =>'Eren ', 33 => 'Esme ', 43 =>'Frankie ',
4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi ', 34 => 'Isa ', 44 =>'Jaskaran',
5 => 'Jaya ', 15 => 'Jo ', 25 =>'Jodie ', 35 => 'Kacey ', 45 =>'Kameron ',
6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie ', 36 => 'Lucca ', 46 =>'Macauley',
7 => 'Manraj ', 17 => 'Nur ', 27 =>'Oluwatobiloba', 37 => 'Reiss ', 47 =>'Riley ',
8 => 'Rima ', 18 => 'Ronnie', 28 =>'Ryley ', 38 => 'Sam ', 48 =>'Sana ',
9 => 'Shola ', 19 => 'Sierra', 29 =>'Tamika ', 39 => 'Taran ', 49 =>'Teagan ',
10=> 'Tia ', 20 => 'Tiegan', 30 =>'Virginia ', 40 => 'Zhane ', 50 =>'Zion '
);
begin
return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual
connect by level<=10;
如您所見,我使用了內聯 PL/SQL 函數以及填充了名稱的關聯數組。 關聯數組是從列表中獲取值的最快方法。 內聯 PL/SQL 函數的工作速度比通常的 PL/SQL 函數快得多(即使它們是用PRAGMA UDF
聲明的)。 DBMS_RANDOM.VALUE 生成 1 到 50 之間的隨機數。這里的 DBMS_RANDOM 是最慢的函數。
SO最終解決方案:
insert/*+ with_plsql */ into person(first_name,last_name,birth_date,gender,salary)
with
-- functions:
function get_first_name return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Aiden ', 11 => 'Anika ', 21 =>'Ariya ', 31 => 'Ashanti', 41 =>'Avery ',
2 => 'Cameron', 12 => 'Ceri ', 22 =>'Che ', 32 => 'Danica ', 42 =>'Darcy ',
3 => 'Dion ', 13 => 'Eman ', 23 =>'Eren ', 33 => 'Esme ', 43 =>'Frankie ',
4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi ', 34 => 'Isa ', 44 =>'Jaskaran',
5 => 'Jaya ', 15 => 'Jo ', 25 =>'Jodie ', 35 => 'Kacey ', 45 =>'Kameron ',
6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie ', 36 => 'Lucca ', 46 =>'Macauley',
7 => 'Manraj ', 17 => 'Nur ', 27 =>'Oluwatobiloba', 37 => 'Reiss ', 47 =>'Riley ',
8 => 'Rima ', 18 => 'Ronnie', 28 =>'Ryley ', 38 => 'Sam ', 48 =>'Sana ',
9 => 'Shola ', 19 => 'Sierra', 29 =>'Tamika ', 39 => 'Taran ', 49 =>'Teagan ',
10=> 'Tia ', 20 => 'Tiegan', 30 =>'Virginia ', 40 => 'Zhane ', 50 =>'Zion '
);
begin
return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;
function get_last_name return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Ahmad ', 11 => 'Andersen', 21 =>'Arias ', 31 => 'Barlow ', 41 =>'Beck ',
2 => 'Bloggs ', 12 => 'Bowes ', 22 =>'Buck ', 32 => 'Burris ', 42 =>'Cano ',
3 => 'Chaney ', 13 => 'Coombes ', 23 =>'Correa ', 33 => 'Coulson ', 43 =>'Craig ',
4 => 'Frye ', 14 => 'Hackett ', 24 =>'Hale ', 34 => 'Huber ', 44 =>'Hyde ',
5 => 'Irving ', 15 => 'Joyce ', 25 =>'Kelley ', 35 => 'Kim ', 45 =>'Larson ',
6 => 'Lynn ', 16 => 'Markham ', 26 =>'Mejia ', 36 => 'Miranda ', 46 =>'Neal ',
7 => 'Newton ', 17 => 'Novak ', 27 =>'Ochoa ', 37 => 'Pate ', 47 =>'Paterson ',
8 => 'Pennington', 18 => 'Rubio ', 28 =>'Santana', 38 => 'Schaefer', 48 =>'Schofield',
9 => 'Shaffer ', 19 => 'Sweeney ', 29 =>'Talley ', 39 => 'Trevino ', 49 =>'Tucker ',
10=> 'Velazquez ', 20 => 'Vu ', 30 =>'Wagner ', 40 => 'Walton ', 50 =>'Woodward '
);
begin
return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;
-- inline views:
t1000(x) as (select level from dual connect by level<=1000)
-- main part:
select
get_first_name() first_name
,get_last_name () last_name
,date'1970-01-01' + dbms_random.value(0, date'2070-12-31'-date'1970-01-01') as birth_date
,decode(round(dbms_random.value()),0, 'MALE', 'FEMALE') gender
,dbms_random.value(1.00, 100000.00) as salary
from t1000, t1000;
此查詢查詢使用預生成的 CTE t1000 使其更快(您可以在 Jonathan Lewis 的文章中閱讀相關內容)。 此解決方案中最慢的部分是 sql 中的序列生成和 dbms_random。 DBMS_RANDOM 是 PL/SQL 函數,需要上下文切換。
附注。 我已經在此處發布了一些如何加快獲取隨機表行的示例: https : //stackoverflow.com/a/62892390/429100
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.