简体   繁体   中英

Incremental loading of files from On-prem file server to Azure Data Lake

We would like to do incremental loading of files from our on-premises file server to Azure Data Lake using Azure Data Factory v2.

Files are supposed to store on daily basis in the on-prem fileserver and we will have to run the ADFv2 pipeline on regular intervals during the day and only the new un-processed files from the folder should be captured.

Our recommendation is to put the set of files for daily ingestion into /YYYY/MM/DD directories. You can refer to this example on how to use system variables (@trigger().scheduledTime) to read files from the corresponding directory:

https://docs.microsoft.com/en-us/azure/data-factory/how-to-read-write-partitioned-data

In the source dataset, you can do file filter.You can do that by time for example (calling datetime function in expression language) or something else what will define new file. https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions Then with a scheduled trigger, you can execute pipeline n times during the day.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM