Azure 数据工厂在复制到 Blob 存储时指定自定义输出文件名

Question

I'm currently using ADF to copy files from an SFTP server to Blob Storage on a scheduled basis.我目前正在使用 ADF 按计划将文件从 SFTP 服务器复制到 Blob 存储。

The filename structure is AAAAAA_BBBBBB_CCCCCC.txt.文件名结构是 AAAAAA_BBBBBB_CCCCCC.txt。

Is it possible to rename the file before copying to Blob Storage so that I end up with a folder-like structure like below?是否可以在复制到 Blob 存储之前重命名文件，以便我最终得到如下所示的类似文件夹的结构？

AAAAAA/BBBBBB/CCCCCC.txt AAAAAA/BBBBBB/CCCCCC.txt

Answer 1

Here is what worked for me这对我有用

I created 3 parameters in my Blob storage dataset, see the image bellow:我在 Blob 存储数据集中创建了 3 个参数，请参见下图：

I specified the name of my file, added the file extension, you can add anything in the Timestamp just so you could bypass the ADF requirement since a parameter can't be empty.我指定了我的文件的名称，添加了文件扩展名，您可以在时间戳中添加任何内容，这样您就可以绕过 ADF 要求，因为参数不能为空。

Next, click on the Connection tab and add the following code in the FileName box: @concat(dataset().FileName,dataset().Timestamp,dataset().FileExtension).接下来，单击 Connection 选项卡并在 FileName 框中添加以下代码：@concat(dataset().FileName,dataset().Timestamp,dataset().FileExtension)。 This code basically concatenate all parameters do you could have something like "FileName_Timestamp_FileExtension. See the image bellow:这段代码基本上连接了所有参数，你可以有像“FileName_Timestamp_FileExtension”这样的东西。请看下面的图片：

Next, click on your pipeline then select your copy data activity.接下来，单击您的管道，然后选择您的复制数据活动。 Click on the Sink tab.单击接收器选项卡。 Find the parameter Timestamp under Dataset properties and add this code: @pipeline().TriggerTime.在 Dataset 属性下找到参数 Timestamp 并添加以下代码：@pipeline().TriggerTime。 See the image bellow:见下图：

Finally, publish your pipeline and run/debug it.最后，发布您的管道并运行/调试它。 If it worked for me then I am sure it will work for you as well :)如果它对我有用，那么我相信它也对你有用:)

Answer 2

With ADF V2, you could do that.使用 ADF V2，您可以做到这一点。 First, use a lookup activity to get all the filenames of your source.首先，使用查找活动来获取源的所有文件名。 Then chain a foreach activity to iterate the source file names.然后链接一个foreach 活动来迭代源文件名。 The foreach activity contains a copy activity. foreach 活动包含一个复制活动。 Both your source dataset and sink dataset of the cop activity have parameters for filename and folder path.警察活动的源数据集和接收器数据集都有文件名和文件夹路径参数。 You could use split and replace functions to generate the sink folder path and filename based on your source file names.您可以使用拆分和替换功能根据源文件名生成接收器文件夹路径和文件名。

Answer 3

First you have to get the filenames in a GetMetadata-Activity.首先，您必须在 GetMetadata-Activity 中获取文件名。 You can use this as a parameter in a copy-Activity and rename the filenames.您可以将其用作复制活动中的参数并重命名文件名。

As mentioned in previous answer you can use a replace function to do this:正如前面的回答中提到的，您可以使用替换功能来执行此操作：

{
    "name": "TgtBooksBlob",
    "properties": {
        "linkedServiceName": {
            "referenceName": "Destination-BlobStorage-data",
            "type": "LinkedServiceReference"
        },
        "folder": {
            "name": "Target"
        },
        "type": "AzureBlob",
        "typeProperties": {
            "fileName": {
                "value": "@replace(item().name, '_', '\\')",
                "type": "Expression"
            },
            "folderPath": "data"
       }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

Azure 数据工厂在复制到 Blob 存储时指定自定义输出文件名

问题描述

3 个解决方案

解决方案1
13 2020-02-13 22:53:40

解决方案2
0 2018-06-09 12:29:31

解决方案3
0 2018-08-12 10:49:04

Azure 数据工厂在复制到 Blob 存储时指定自定义输出文件名

问题描述

3 个解决方案

解决方案1 13 2020-02-13 22:53:40

解决方案2 0 2018-06-09 12:29:31

解决方案3 0 2018-08-12 10:49:04

解决方案1
13 2020-02-13 22:53:40

解决方案2
0 2018-06-09 12:29:31

解决方案3
0 2018-08-12 10:49:04