简体   繁体   中英

Snowpipe for continuous ingestion of daily files arriving irregularly

I am new to snowflake and we are working on a POC. The scenario is we get around 100 (.txt) files from our ERP system uploaded into S3 bucket overnight. We would need these files to be loaded into Staging tables and then to DW tables, with data transformations applied, in snowflake. We are thinking of using snowpipe to load the data from S3 to the staging tables as file arrival from ERP is not scheduled and could be anytime within a window of four hours. The daily files are timestamped and will have full data daily. So we would need the staging tables to be truncated daily before ingesting the day's file.

But snow-pipe definition doesn't allow truncate/create statements.

Please share your thoughts on this. Should we continue considering snow-pipe? or try using COPY command scheduled as a 'TASK' to run at fixed intervals, say for every 15 minutes?

Have you considered just continually adding the data to your stage tables, put an append-only STREAM over that table, and then use tasks to load downstream tables from the STREAM . The task could run every minute with a WHEN statement that checks whether data is in the STREAM or not. This would load the data and push it downstream whenever the data happens to land from your ERP.

Then, you can have a daily task that runs anytime during the day which checks the STREAM to make sure there is NO DATA in it, and if that's true, then DELETE everything in the underlying table. This step only needs to happen to save storage and because the STREAM is append-only, the DELETE statement does not create records in your STREAM .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM