简体   繁体   中英

Populate table with 1 million records

I have one table person .

CREATE TABLE PERSON(
    ID           NUMBER GENERATED BY DEFAULT AS IDENTITY,
    first_name    VARCHAR2(50),
    last_name     VARCHAR2(50),
    birth_date    DATE,
    gender        CHAR(10),
    salary        NUMBER(10, 2),
    CONSTRAINT PERSON_PK PRIMARY KEY (ID)
    );

I need to Populate PERSON table with 1 million records. The columns should be populated with random values that follows below parameters:

- "first_name" should be populated with a random name from the list of 50 names provided below:
    | Aiden         | Anika         | Ariya         | Ashanti       | Avery         |
    | Cameron       | Ceri          | Che           | Danica        | Darcy         |
    | Dion          | Eman          | Eren          | Esme          | Frankie       |
    | Gurdeep       | Haiden        | Indi          | Isa           | Jaskaran      |
    | Jaya          | Jo            | Jodie         | Kacey         | Kameron       |
    | Kayden        | Keeley        | Kenzie        | Lucca         | Macauley      |
    | Manraj        | Nur           | Oluwatobiloba | Reiss         | Riley         |
    | Rima          | Ronnie        | Ryley         | Sam           | Sana          |
    | Shola         | Sierra        | Tamika        | Taran         | Teagan        |
    | Tia           | Tiegan        | Virginia      | Zhane         | Zion          |
- "last_name" should be populated with a random name from the list of 50 names provided below:
    | Ahmad         | Andersen      | Arias         | Barlow        | Beck          |
    | Bloggs        | Bowes         | Buck          | Burris        | Cano          |
    | Chaney        | Coombes       | Correa        | Coulson       | Craig         |
    | Frye          | Hackett       | Hale          | Huber         | Hyde          |
    | Irving        | Joyce         | Kelley        | Kim           | Larson        |
    | Lynn          | Markham       | Mejia         | Miranda       | Neal          |
    | Newton        | Novak         | Ochoa         | Pate          | Paterson      |
    | Pennington    | Rubio         | Santana       | Schaefer      | Schofield     |
    | Shaffer       | Sweeney       | Talley        | Trevino       | Tucker        |
    | Velazquez     | Vu            | Wagner        | Walton        | Woodward      |        
- duplicate combinations of "first_name" and "last_name" are allowed    
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table

Please share me the query

If you really don't care about the exact names then you can do something like this:

select  Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as first_name,
        Initcap(dbms_random.string('l',dbms_random.value(4, 10))) as last_name,
        to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','dd-mm-yyyy'),'J'))),'J') as birth_date,
        trunc(dbms_random.value(1,100000)) as sal,
        case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from    dual connect by level <= 1000000 --Change here to whatever you want

Sample Output (Obviously not all but the first few):

在此处输入图片说明

NB : I'll be focused on performance, ie how to generate 1mln values as fast as possible:

First part: I'll show how to generate random names very fast:

with function get_first_name(N in int) return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual 
connect by level<=10;

As you can see, I've used inline PL/SQL function with associative array filled with the names. Associative arrays is a fastest possible way to get a value from the list. Inline PL/SQL function works much faster than usual PL/SQL functions (even if they are declared with PRAGMA UDF ). DBMS_RANDOM.VALUE generates random number between 1 and 50. DBMS_RANDOM here is a slowest function.

SO final solution:

insert/*+ with_plsql */  into person(first_name,last_name,birth_date,gender,salary)
with 
-- functions:
function get_first_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ', 11 => 'Anika ', 21 =>'Ariya        ', 31 => 'Ashanti', 41 =>'Avery   ',
     2 => 'Cameron', 12 => 'Ceri  ', 22 =>'Che          ', 32 => 'Danica ', 42 =>'Darcy   ',
     3 => 'Dion   ', 13 => 'Eman  ', 23 =>'Eren         ', 33 => 'Esme   ', 43 =>'Frankie ',
     4 => 'Gurdeep', 14 => 'Haiden', 24 =>'Indi         ', 34 => 'Isa    ', 44 =>'Jaskaran',
     5 => 'Jaya   ', 15 => 'Jo    ', 25 =>'Jodie        ', 35 => 'Kacey  ', 45 =>'Kameron ',
     6 => 'Kayden ', 16 => 'Keeley', 26 =>'Kenzie       ', 36 => 'Lucca  ', 46 =>'Macauley',
     7 => 'Manraj ', 17 => 'Nur   ', 27 =>'Oluwatobiloba', 37 => 'Reiss  ', 47 =>'Riley   ',
     8 => 'Rima   ', 18 => 'Ronnie', 28 =>'Ryley        ', 38 => 'Sam    ', 48 =>'Sana    ',
     9 => 'Shola  ', 19 => 'Sierra', 29 =>'Tamika       ', 39 => 'Taran  ', 49 =>'Teagan  ',
     10=> 'Tia    ', 20 => 'Tiegan', 30 =>'Virginia     ', 40 => 'Zhane  ', 50 =>'Zion    '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;

function get_last_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Ahmad     ', 11 => 'Andersen', 21 =>'Arias  ', 31 => 'Barlow  ', 41 =>'Beck     ',
     2 => 'Bloggs    ', 12 => 'Bowes   ', 22 =>'Buck   ', 32 => 'Burris  ', 42 =>'Cano     ',
     3 => 'Chaney    ', 13 => 'Coombes ', 23 =>'Correa ', 33 => 'Coulson ', 43 =>'Craig    ',
     4 => 'Frye      ', 14 => 'Hackett ', 24 =>'Hale   ', 34 => 'Huber   ', 44 =>'Hyde     ',
     5 => 'Irving    ', 15 => 'Joyce   ', 25 =>'Kelley ', 35 => 'Kim     ', 45 =>'Larson   ',
     6 => 'Lynn      ', 16 => 'Markham ', 26 =>'Mejia  ', 36 => 'Miranda ', 46 =>'Neal     ',
     7 => 'Newton    ', 17 => 'Novak   ', 27 =>'Ochoa  ', 37 => 'Pate    ', 47 =>'Paterson ',
     8 => 'Pennington', 18 => 'Rubio   ', 28 =>'Santana', 38 => 'Schaefer', 48 =>'Schofield',
     9 => 'Shaffer   ', 19 => 'Sweeney ', 29 =>'Talley ', 39 => 'Trevino ', 49 =>'Tucker   ',
     10=> 'Velazquez ', 20 => 'Vu      ', 30 =>'Wagner ', 40 => 'Walton  ', 50 =>'Woodward '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;

  -- inline views:
  t1000(x) as (select level from dual connect by level<=1000)

-- main part:
select 
   get_first_name() first_name
  ,get_last_name () last_name
  ,date'1970-01-01' + dbms_random.value(0, date'2070-12-31'-date'1970-01-01') as birth_date
  ,decode(round(dbms_random.value()),0, 'MALE', 'FEMALE') gender
  ,dbms_random.value(1.00, 100000.00) as salary
from t1000, t1000;

This query query uses pregenerated CTE t1000 to make it faster (you can read about this in the articles by Jonathan Lewis). The slowest parts in this solution are sequence generating and dbms_random in sql. DBMS_RANDOM is PL/SQL function and requires context switches.

PS. I've already posted few examples how to speed up getting random table rows here: https://stackoverflow.com/a/62892390/429100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM