简体   繁体   English

用 100 万条记录填充表

[英]Populate table with 1 million records

I have one table person .我有一桌person

CREATE TABLE PERSON(
    ID           NUMBER GENERATED BY DEFAULT AS IDENTITY,
    first_name    VARCHAR2(50),
    last_name     VARCHAR2(50),
    birth_date    DATE,
    gender        CHAR(10),
    salary        NUMBER(10, 2),
    CONSTRAINT PERSON_PK PRIMARY KEY (ID)
    );

I need to Populate PERSON table with 1 million records.我需要用 100 万条记录填充PERSON表。 The columns should be populated with random values that follows below parameters:这些列应使用遵循以下参数的随机值填充:

- "first_name" should be populated with a random name from the list of 50 names provided below:
    | Aiden         | Anika         | Ariya         | Ashanti       | Avery         |
    | Cameron       | Ceri          | Che           | Danica        | Darcy         |
    | Dion          | Eman          | Eren          | Esme          | Frankie       |
    | Gurdeep       | Haiden        | Indi          | Isa           | Jaskaran      |
    | Jaya          | Jo            | Jodie         | Kacey         | Kameron       |
    | Kayden        | Keeley        | Kenzie        | Lucca         | Macauley      |
    | Manraj        | Nur           | Oluwatobiloba | Reiss         | Riley         |
    | Rima          | Ronnie        | Ryley         | Sam           | Sana          |
    | Shola         | Sierra        | Tamika        | Taran         | Teagan        |
    | Tia           | Tiegan        | Virginia      | Zhane         | Zion          |
- "last_name" should be populated with a random name from the list of 50 names provided below:
    | Ahmad         | Andersen      | Arias         | Barlow        | Beck          |
    | Bloggs        | Bowes         | Buck          | Burris        | Cano          |
    | Chaney        | Coombes       | Correa        | Coulson       | Craig         |
    | Frye          | Hackett       | Hale          | Huber         | Hyde          |
    | Irving        | Joyce         | Kelley        | Kim           | Larson        |
    | Lynn          | Markham       | Mejia         | Miranda       | Neal          |
    | Newton        | Novak         | Ochoa         | Pate          | Paterson      |
    | Pennington    | Rubio         | Santana       | Schaefer      | Schofield     |
    | Shaffer       | Sweeney       | Talley        | Trevino       | Tucker        |
    | Velazquez     | Vu            | Wagner        | Walton        | Woodward      |        
- duplicate combinations of "first_name" and "last_name" are allowed    
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table

Please share me the query请与我分享查询

If you really don't care about the exact names then you can do something like this:如果您真的不关心确切的名称,那么您可以执行以下操作:

select  Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as first_name,
        Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as last_name,
        to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','dd-mm-yyyy'),'J'))),'J') as birth_date,
        trunc(dbms_random.value(1,100000)) as sal,
        case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from    dual connect by level <= 1000000 --Change here to whatever you want

Sample Output (Obviously not all but the first few):示例输出(显然不是全部,只有前几个):

在此处输入图片说明

NB : I'll be focused on performance, ie how to generate 1mln values as fast as possible:注意:我将专注于性能,即如何尽快生成 100 万个值:

First part: I'll show how to generate random names very fast:第一部分:我将展示如何非常快速地生成随机名称:

with function get_first_name(N in int) return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual 
connect by level<=10;

As you can see, I've used inline PL/SQL function with associative array filled with the names.如您所见,我使用了内联 PL/SQL 函数以及填充了名称的关联数组。 Associative arrays is a fastest possible way to get a value from the list.关联数组是从列表中获取值的最快方法。 Inline PL/SQL function works much faster than usual PL/SQL functions (even if they are declared with PRAGMA UDF ).内联 PL/SQL 函数的工作速度比通常的 PL/SQL 函数快得多(即使它们是用PRAGMA UDF声明的)。 DBMS_RANDOM.VALUE generates random number between 1 and 50. DBMS_RANDOM here is a slowest function. DBMS_RANDOM.VALUE 生成 1 到 50 之间的随机数。这里的 DBMS_RANDOM 是最慢的函数。

SO final solution: SO最终解决方案:

insert/*+ with_plsql */  into person(first_name,last_name,birth_date,gender,salary)
with 
-- functions:
function get_first_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;

function get_last_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Ahmad     ', 11 => 'Andersen', 21 =>'Arias  ', 31 => 'Barlow  ', 41 =>'Beck     ',
     2 => 'Bloggs    ', 12 => 'Bowes   ', 22 =>'Buck   ', 32 => 'Burris  ', 42 =>'Cano     ',
     3 => 'Chaney    ', 13 => 'Coombes ', 23 =>'Correa ', 33 => 'Coulson ', 43 =>'Craig    ',
     4 => 'Frye      ', 14 => 'Hackett ', 24 =>'Hale   ', 34 => 'Huber   ', 44 =>'Hyde     ',
     5 => 'Irving    ', 15 => 'Joyce   ', 25 =>'Kelley ', 35 => 'Kim     ', 45 =>'Larson   ',
     6 => 'Lynn      ', 16 => 'Markham ', 26 =>'Mejia  ', 36 => 'Miranda ', 46 =>'Neal     ',
     7 => 'Newton    ', 17 => 'Novak   ', 27 =>'Ochoa  ', 37 => 'Pate    ', 47 =>'Paterson ',
     8 => 'Pennington', 18 => 'Rubio   ', 28 =>'Santana', 38 => 'Schaefer', 48 =>'Schofield',
     9 => 'Shaffer   ', 19 => 'Sweeney ', 29 =>'Talley ', 39 => 'Trevino ', 49 =>'Tucker   ',
     10=> 'Velazquez ', 20 => 'Vu      ', 30 =>'Wagner ', 40 => 'Walton  ', 50 =>'Woodward '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;

  -- inline views:
  t1000(x) as (select level from dual connect by level<=1000)

-- main part:
select 
   get_first_name() first_name
  ,get_last_name () last_name
  ,date'1970-01-01' + dbms_random.value(0, date'2070-12-31'-date'1970-01-01') as birth_date
  ,decode(round(dbms_random.value()),0, 'MALE', 'FEMALE') gender
  ,dbms_random.value(1.00, 100000.00) as salary
from t1000, t1000;

This query query uses pregenerated CTE t1000 to make it faster (you can read about this in the articles by Jonathan Lewis).此查询查询使用预生成的 CTE t1000 使其更快(您可以在 Jonathan Lewis 的文章中阅读相关内容)。 The slowest parts in this solution are sequence generating and dbms_random in sql.此解决方案中最慢的部分是 sql 中的序列生成和 dbms_random。 DBMS_RANDOM is PL/SQL function and requires context switches. DBMS_RANDOM 是 PL/SQL 函数,需要上下文切换。

PS.附注。 I've already posted few examples how to speed up getting random table rows here: https://stackoverflow.com/a/62892390/429100我已经在此处发布了一些如何加快获取随机表行的示例: https : //stackoverflow.com/a/62892390/429100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM