简体   繁体   中英

Azure Data Factory: Storage event trigger only on new files

I have following folder structure in Azure Blob Storage:

container/
  dataset1/
    2021-01-01/
      file_01.parquet
      file_02.parquet
    2021-01-02/
      file_01.parquet
      file_02.parquet
      file_03.parquet
  dataset2/
    2021-01-01/
      file_01.parquet
    2021-01-02/
      file_01.parquet
      file_02.parquet
  .
  .
  . etc...

I have pipelines for each of the dataset folders. Pipelines iterate the files in "date"-folders, process them and output the results elsewhere. Each pipeline has input dataset path defined like this: container/dataset/ . This works fine. When I trigger the pipeline, it goes through all the files.

Now I would like to automate the pipeline so that it is triggered when new data is added to the dataset folder (it will always be in folder with a date in its name). I guess that I can do it with storage event trigger but does it run the pipeline for each of the "date" folders or only on the one that was added?

The Event Trigger is based on Blob path begins and Ends. So in case if your trigger has Blob Path Begins as dataset1/: Then any new file uploaded in that dataset would trigger the ADF pipeline.

As to the consumption of the files within pipeline is completely managed by the dataset parameters. So ideally Event trigger and input dataset values are both independent and completely depends on how we manage to design it.

在此处输入图像描述

The below is the workflow on how it will work:

你

When a new item to the storage account is added matching to storage event trigger (blob path begins with / endswith). A message is published to the event grind and the message is in turn relayed to the Data Factory. This triggers the Pipeline. If you pipeline is designed to get the data from all the folders - then yes - you would be getting the data from the complete dataset.

Alternatively if you want to copy only the specific file, then you can configure the dataset properties Copy folder and Copy File.

在此处输入图像描述

By default, the storage event trigger captures the folder path and file name of the blob into the properties @triggerBody().folderPath and @triggerBody().fileName

You can map it to pipeline parameter and consume like mentioned above.

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM