简体   繁体   中英

Snowflake Ext Stage location (EG. AWS s3) has nested folders with date info in the folder names

I have external stage organized as follows:

s3://finance/credits

/Week_2022_0601_0607
     file01.json
     file02.json
/Week_2022_0608_0615
     file01.json
     file02.json
     file03.json
etc...  New folders will get added each week

Can I define my storage_location property for my external stage as:

"s3://finance/credits/./*.json"

so that in my COPY INTO... code, snowflake will automatically traverse the nested "date info" related folder and load all the files? Since new folders will be added each week, I cannot create multiple hard-coded folders in the stage storage_location path for the stage.

This really applies to any path - COPY INTO with or without using a Stage.

In the Snowflake Citibike Lab

You create a stage like:

create stage citibike.public.citibike_trips 
    url = 's3://snowflake-workshop-lab/citibike-trips';

a file format like:

create file format citibike.public.csv type = csv 
    FIELD_OPTIONALLY_ENCLOSED_BY = '"' 
    NULL_IF = ('\\N', '');

then load the files like:

copy into trips 
    from @citibike_trips 
    file_format = csv 
    PATTERN= '.*trips_.*csv.gz';

anyways, a S3 Object name is not a PATH, it is just a string, which looks like a path, and thus when you match the path, ALL files that match are returned.

This point should be strongly considered as as your set of files builds up, you can start have millions of files in S3, and that full list will be transferred to Snowflake on each operation.

Anyways Snowfalke keeps a list of the files loaded in the last 2 weeks and does not reload these if they have not changed. Files older than 2 weeks are assumed not changed and ignored.

The standard advice is to track a high water mark, and have you folder/path hierarchical year-month-week/day so you can use progressive path filters, to reduce the LIST size transfer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM