简体   繁体   中英

Return five rows of random DNA instead of just one

This is the code I have to create a string of DNA:

prepare dna_length(int) as
  with t1 as (
    select chr(65) as s 
      union select chr(67) 
      union select chr(71) 
      union select chr(84) )
, t2 as ( select s, row_number() over() as rn from t1)
, t3 as ( select generate_series(1,$1) as i, round(random() * 4 + 0.5) as rn )
, t4 as ( select t2.s from t2 join t3 on (t2.rn=t3.rn))
select array_to_string(array(select s from t4),'') as dna;

execute dna_length(20);

I am trying to figure out how to re-write this to give a table of 5 rows of strings of DNA of length 20 each, instead of just one row. This is for PostgreSQL.

I tried:

CREATE TABLE dna_table(g int, dna text);
INSERT INTO dna_table (1, execute dna_length(20));

But this does not seem to work. I am an absolute beginner. How to do this properly?

PREPARE creates a prepared statement that can be used "as is". If your prepared statement returns one string then you can only get one string. You can't use it in other operations like insert, eg

In your case you may create a function:

create or replace function dna_length(int) returns text as
$$
with t1 as (
    select chr(65) as s
    union
    select chr(67)
    union
    select chr(71)
    union
    select chr(84))
   , t2 as (select s,
                   row_number() over () as rn
            from t1)
   , t3 as (select generate_series(1, $1)    as i,
                   round(random() * 4 + 0.5) as rn)
   , t4 as (select t2.s
            from t2
                     join t3 on (t2.rn = t3.rn))
select array_to_string(array(select s from t4), '') as dna
$$ language sql;

And use it in a way like this:

insert into dna_table(g, dna) select generate_series(1,5), dna_length(20)

From the official doc :

PREPARE creates a prepared statement. A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.

Aboutfunctions .

This can be much simpler and faster:

SELECT string_agg(CASE ceil(random() * 4)
                   WHEN 1 THEN 'A'
                   WHEN 2 THEN 'C'
                   WHEN 3 THEN 'T'
                   WHEN 4 THEN 'G'
                  END, '') AS dna
FROM   generate_series(1,100) g  -- 100 = 5 rows * 20 nucleotides
GROUP  BY g%5;

random() produces random value in the range 0.0 <= x < 1.0 . Multiply by 4 and take the mathematical ceiling with ceil() (cheaper than round() ), and you get a random distribution of the numbers 1-4. Convert to ACTG, and aggregate with GROUP BY g%5 - % being the modulo operator .

About string_agg() :

As prepared statement, taking
$1 ... the number of rows
$2 ... the number of nucleotides per row

PREPARE dna_length(int, int) AS
SELECT string_agg(CASE ceil(random() * 4)
                   WHEN 1 THEN 'A'
                   WHEN 2 THEN 'C'
                   WHEN 3 THEN 'T'
                   WHEN 4 THEN 'G'
                  END, '') AS dna
FROM   generate_series(1, $1 * $2) g
GROUP  BY g%$1;

Call:

EXECUTE dna_length(5,20);

Result:

| dna                  |
| :------------------- |
| ATCTTCGACACGTCGGTACC |
| GTGGCTGCAGATGAACAGAG |
| ACAGCTTAAAACACTAAGCA |
| TCCGGACCTCTCGACCTTGA |
| CGTGCGGAGTACCCTAATTA |

db<>fiddle here

If you need it a lot, consider a function instead. See:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM