简体繁体中英

Azure data factory | incremental data load from SFTP to Blob

原文 2018-05-11 18:20:50 2 2 azure/ azure-data-factory/ azure-data-factory-2

I created a (once run) DF (V2) pipeline to load files (.lta.gz) from a SFTP server into an azure blob to get historical data. Worked beautifully. Every day there will be several new files on the SFTP server (which cannot be manipulated or deleted). So I want to create an incremental load pipeline which checks daily for new files - if so ---> copy new files.

Does anyone have any tips for me how to achieve this?

2 answers

Thanks for using Data Factory!

To incrementally load newly generated files on SFTP server, you can leverage the GetMetadata activity to retrieve the LastModifiedDate property: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

Essentially you author a pipeline containing the following activities:

getMetadata (return list of files under a given folder)
ForEach (iterate through each file)
getMetadata (return lastModifiedTime for a given file)
IfCondition (compare lastModifiedTime with trigger WindowStartTime)
Copy (copy file from source to destination)

Have fun building data integration flows using Data Factory!

since I posted my previous answer in May last year, many of you contacted me asking for pipeline sample to achieve the incremental file copy scenario using the getMetadata-ForEach-getMetadata-If-Copy pattern. This has been important feedback that incremental file copy is a common scenario that we want to further optimize.

Today I would like to post an updated answer - we recently released a new feature that allows a much easier and scalability approach to achieve the same goal:

You can now set modifiedDatetimeStart and modifiedDatetimeEnd on SFTP dataset to specify the time range filters to only extract files that were created/modified during that period. This enables you to achieve the incremental file copy using a single activity: https://docs.microsoft.com/en-us/azure/data-factory/connector-sftp#dataset-properties

This feature is enabled for these file-based connectors in ADF: AWS S3, Azure Blob Storage, FTP, SFTP, ADLS Gen1, ADLS Gen2, and on-prem file system. Support for HDFS is coming very soon.

Further, to make it even easier to author an incremental copy pipeline, we now release common pipeline patterns as solution templates. You can select one of the templates, fill out the linked service and dataset info, and click deploy – it is that simple! https://docs.microsoft.com/en-us/azure/data-factory/solution-templates-introduction

You should be able to find the incremental file copy solution in the gallery: https://docs.microsoft.com/en-us/azure/data-factory/solution-template-copy-new-files-lastmodifieddate

Once again, thank you for using ADF and happy coding data integration with ADF!

What is the way to incremental sftp from remote server to azure using azure data factory

Is there any way to copy incremental data from SQL Server to Blob Storage Through Azure Data Factory?

SFTP support for azure data factory

Azure Data Factory Incremental Load without altering on premises database

How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory?

How to load csv file from blob storage to azure sql database without Azure Data Factory

how to load file from share point online to Azure Blob using Azure Data Factory

Incremental data transfer using Azure Data Factory

Azure Data Factory Copy Data SFTP

Access Azure Blob storage account from azure data factory

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the way to incremental sftp from remote server to azure using azure data factory Is there any way to copy incremental data from SQL Server to Blob Storage Through Azure Data Factory? SFTP support for azure data factory Azure Data Factory Incremental Load without altering on premises database How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory? How to load csv file from blob storage to azure sql database without Azure Data Factory how to load file from share point online to Azure Blob using Azure Data Factory Incremental data transfer using Azure Data Factory Azure Data Factory Copy Data SFTP Access Azure Blob storage account from azure data factory

Related Tags

Azure data factory | incremental data load from SFTP to Blob

Question

2 answers

solution1
3 2018-05-12 04:42:46

solution2
1 2019-03-02 06:15:02

Azure data factory | incremental data load from SFTP to Blob

Question

2 answers

solution1 3 2018-05-12 04:42:46

solution2 1 2019-03-02 06:15:02

solution1
3 2018-05-12 04:42:46

solution2
1 2019-03-02 06:15:02