简体   繁体   English

Azure 数据工厂:存储事件触发器的动态路径值

[英]Azure Data Factory: Dynamic path value for the Storage Event Trigger

I have created an azure data factory pipeline to copy the data from one adls container to another adls container using copy data activity.我创建了一个 azure 数据工厂管道,使用复制数据活动将数据从一个 adls 容器复制到另一个 adls 容器。 This copy activity will trigger using a storage event trigger.此复制活动将使用存储事件触发器触发。

So whenever a new file gets generated, it will trigger the activity.因此,无论何时生成新文件,它都会触发该活动。
The source file is located in a nested directory structure having dynamic folders such as year, month, and day, which vary based on date.源文件位于一个嵌套的目录结构中,该目录结构具有动态文件夹,例如年、月和日,这些文件夹随日期而变化。

In the trigger, I mentioned the path until the fixed folder path, but I don't know what value I should put for the dynamic path.在触发器中,我提到了固定文件夹路径之前的路径,但我不知道应该为动态路径设置什么值。
Initially, I provided the path such as my/fixed/directory/*/*/*/ ,最初,我提供了诸如my/fixed/directory/*/*/*/之类的路径,
but at the time of execution, it throws the exception 'PathNotFound'.但在执行时,它会抛出异常“PathNotFound”。

So my question is - How can I provide the path to the storage event trigger with the dynamic folder structure?所以我的问题是 - 如何使用动态文件夹结构提供存储事件触发器的路径? Following is ADF copy data pipeline screenshot:以下是 ADF 复制数据管道屏幕截图:
Pipeline-管道- 在此处输入图像描述

Copy data activity source configuration-复制数据活动源配置- 在此处输入图像描述

Copy data activity target configuration-复制数据活动目标配置- 在此处输入图像描述

Copy data activity source dataset configuration-复制数据活动源数据集配置- 在此处输入图像描述

Copy data activity target dataset configuration-复制数据活动目标数据集配置- 在此处输入图像描述

Storage event configuration-存储事件配置- 在此处输入图像描述

  • Wildcards are not supported for blob path begins with or blob path ends with in storage event triggers.存储事件触发器中的blob path begins withblob path ends with为不支持通配符。
  • However, creating a storage event trigger on the fixed parent directory would trigger the pipeline for any file created/deleted in child directories as well.但是,在固定父目录上创建存储事件触发器也会触发在子目录中创建/删除的任何文件的管道。
  • Let's say I have the folder structure as shown below where input/folder/2022 is my fixed directory (input is container name).假设我有如下所示的文件夹结构,其中input/folder/2022是我的固定目录(输入是容器名称)。 I also have sub folders within each of the folders shown below.我还在下面显示的每个文件夹中都有子文件夹。

在此处输入图像描述

  • Now, I have created a copy data activity.现在,我已经创建了一个复制数据活动。 The folder name and file name dynamic content for source dataset is shown below (parameter values will be passed from pipeline):源数据集的文件夹名和文件名动态内容如下所示(参数值将从管道传递):
folder path:  @replace(dataset().folder_name,'input/','')

file name:  @dataset().file_name

在此处输入图像描述

  • The folder name and file name dynamic content for sink dataset is shown below. sink数据集的文件夹名和文件名动态内容如下所示。 This is a different container named data :这是一个名为data的不同容器:
folder path: @concat('output/',replace(dataset().folder,'input/folder/',''))

file name: @dataset().file

在此处输入图像描述

  • After configuring the copy activity is done, create a storage event trigger.配置复制活动完成后,创建存储事件触发器。

在此处输入图像描述

  • Here, the values from pipeline parameters folderName and fileName will be set while creating trigger as shown below:在这里,管道参数folderNamefileName的值将在创建触发器时设置,如下所示:
fileName : @triggerBody().fileName
folderName : @triggerBody().folderPath

在此处输入图像描述

  • After you attach the trigger and create a pipeline, when ever any file is uploaded to any folder within the fixed directory folder/2022 the pipeline will be triggered.在附加触发器并创建管道后,只要有任何文件上传到固定目录folder/2022中的任何文件夹,管道就会被触发。
  • I have uploaded a file to folder/2022/03/01/sample1.csv .我已将文件上传到folder/2022/03/01/sample1.csv This triggered the pipeline successfully.这成功触发了管道。

在此处输入图像描述

  • The file is successfully copied as well.该文件也已成功复制。 The following is an image for reference:以下是供参考的图像:

在此处输入图像描述

So, creating a storage event trigger for just the parent directory is sufficient to be able to trigger the pipeline for any file uploaded to child directories as well.因此,仅为父目录创建存储事件触发器就足以触发上传到子目录的任何文件的管道。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM