简体   繁体   中英

Redshift Copy from s3 using for loop

I have many files to load in S3. And I have created manifest file at each prefix of the files.

for instance, at s3://my-bucket/unit_1 I have files like below.

chunk1.csv.gz
chunk2.csv.gz
chunk3.csv.gz
cunkk4.csv.gz 
unit.manifest

so with copy command, I can load the unit_1 files to redshift

However, I got more than 1000 units so I want to do it with loop. So I want to make loop that iterate from 1 to 1000 to change just prefix of the manifest file.

So I did like below,

create or replace procedure copy_loop()
language plpgsql
as $$
BEGIN
    FOR i in 1..1000 LOOP
    COPY mytable
    FROM 's3://my-bucket/unit_%/unit.manifest', i 
    credentials 'aws_iam_role=arn:aws:iam::myrolearn'
    MANIFEST
    REGION 'ap-northeast-2'
    REMOVEQUOTES
    IGNOREHEADER 1
    ESCAPE
    DATEFORMAT 'auto'
    TIMEFORMAT 'auto'
    GZIP
    DELIMITER '|'
    ACCEPTINVCHARS '?'
    COMPUPDATE FALSE
    STATUPDATE FALSE
    MAXERROR 0
    BLANKSASNULL
    EMPTYASNULL
    NULL AS '\N'
    EXPLICIT_IDS;
    END LOOP;
END; 
$$;

But I got this message

SQL Error [500310] [42601]: Amazon Invalid operation: syntax error at or near ",";

How can I handle this?

This is my solution.

create or replace procedure copy_loop(i1 int, i2 int)
language plpgsql
as $$
DECLARE 
    prefix TEXT := 's3://mybucket/unit_';
    manifest TEXT := '/unit.manifest' ;
    manifest_location TEXT ;
    copy_commands VARCHAR(2000) ;
    copy_options VARCHAR(2000) := 'credentials '|| quote_literal('aws_iam_role=myrolearn')
    || ' MANIFEST '
    || ' REGION ' || quote_literal('ap-northeast-2')
    || ' REMOVEQUOTES '
    || ' IGNOREHEADER 1 '
    || ' ESCAPE '
    || ' DATEFORMAT ' || quote_literal('auto')
    || ' TIMEFORMAT ' || quote_literal('auto')
    || ' GZIP '
    || ' DELIMITER ' ||  quote_literal('|')
    || ' ACCEPTINVCHARS ' || quote_literal('?')
    || ' COMPUPDATE FALSE '
    || ' STATUPDATE FALSE '
    || ' MAXERROR 0 '
    || ' BLANKSASNULL '
    || ' EMPTYASNULL '
    || ' NULL AS ' || quote_literal('\N')
    || ' EXPLICIT_IDS ';
BEGIN 
    FOR i in i1..i2 LOOP
        manifest_location := prefix || i || manifest;
        copy_commands := 'COPY mytable FROM' || quote_literal(manifest_location) || copy_options;
        execute copy_commands;
    END LOOP;
END;
$$;

using this procedure, I could copy files from more than 1000 units. also set starting number and end number of the loop helped to divide the loading jobs. Since large amount loading takes few hours, I think it is better to do load job with some chunks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM