简体   繁体   English

Azure数据工厂 - 从FTP服务器复制多个父文件夹中的特定文件

[英]Azure Data Factory- Copy specific files from multiple Parent folders from FTP Server

I am trying to copy the .ZIP files from FTP Server to Azure DataLake. 我试图将.ZIP文件从FTP服务器复制到Azure DataLake。 I need to copy specific files from specific parent folders(Totally i have 6 parent folders in the FTP)and this pipeline needs to scheduled. 我需要从特定的父文件夹中复制特定文件(总共我在FTP中有6个父文件夹),并且需要安排此管道。 So how should i provide the parameters such that Pipeline should select only the specific files from the different folders? 那么我应该如何提供参数,以便管道应该只选择不同文件夹中的特定文件?

I have used Metadata Activity and tried creating pipelines but not sure how to provide the pipeline to pick only specific files! 我已经使用了元数据活动并试图创建管道,但不知道如何提供管道来只选择特定的文件!

Azure Data Factory supports compress/decompress data during copy. Azure Data Factory支持在复制期间压缩/解压缩数据。 When you specify compression property in an input dataset, the copy activity read the compressed data from the source and decompress it; 在输入数据集中指定压缩属性时,复制活动从源读取压缩数据并解压缩; and when you specify the property in an output dataset, the copy activity compress then write data to the sink. 当您在输出数据集中指定属性时,复制活动压缩然后将数据写入接收器。

For example: 例如:

Read .zip file from FTP server, decompress it to get the files inside, and land those files in Azure Data Lake Store. 从FTP服务器读取.zip文件,解压缩以获取文件,并将这些文件放在Azure Data Lake Store中。 You define an input FTP dataset with the compression type property as ZipDeflate. 您将压缩类型属性定义为ZipDeflate的输入FTP数据集。

For more details, please reference: Compression support . 有关详细信息,请参阅: 压缩支持

Here's the tutorial about Copy data from FTP server by using Azure Data Factory . 以下是使用Azure Data Factory从FTP服务器复制数据的教程。

Other format dataset To copy data from FTP in ORC/Avro/JSON/Binary format, the following properties are supported in this link: Other format dataset . 其他格式数据集要以ORC / Avro / JSON /二进制格式从FTP复制数据,此链接支持以下属性: 其他格式数据集

在此输入图像描述

Tips: 提示:

  1. To copy all files under a folder, specify folderPath only. 要复制文件夹下的所有文件, 仅指定folderPath
  2. To copy a single file with a given name, specify folderPath with folder part and fileName with file name. 要复制具有给定名称的单个文件,请指定带有文件夹部分的folderPath和带有文件名的fileName
  3. To copy a subset of files under a folder, specify folderPath with folder part and fileName with wildcard filter. 要复制文件夹下的文件子集,请指定带有文件夹部分的folderPath和带有通配符过滤器的fileName

Hope this helps. 希望这可以帮助。

You'll need to use the filter activity to filter only the folders / files that you need. 您需要使用过滤器活动来仅过滤所需的文件夹/文件。 I think you'll need 2 loops: 我想你需要2个循环:

Loop 1: get metadata of folders -> Filter required folders -> foreach pipeline with loop 2 Loop 2: get meta data of files of files -> Filter required files -> copy required files 循环1:获取文件夹的元数据 - >过滤所需的文件夹 - >带循环2的foreach管道循环2:获取文件文件的元数据 - >过滤所需文件 - >复制所需文件

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据Azure数据工厂中的文件名将文件从一个文件夹复制到多个文件夹 - Copy Files from a folder to multiple folders based on the file name in Azure Data Factory 来自FTP的Azure数据工厂二进制副本 - Azure data factory binary copy from FTP Azure 数据工厂 - 将数据从 ADLS 复制到 Cosmos DB - 在创建 ADLS 作为源的连接时出错 - Azure Data Factory- Copy data from ADLS -to Cosmos DB - getting error while creating connection for ADLS as source Azure 数据工厂 - 在同一源表和目标表中更新或插入值 - Azure Data Factory- Updating or Inserting Values from and to the same source and target table 从 Azure 数据工厂中的 FTP 服务器中删除 CSV 文件 - Delete CSV File from FTP server in Azure Data Factory 如何将文件从多个源文件夹复制到 azure 数据湖存储第 2 代中的目标文件夹 - How to copy files from multiple source folders to target folders in azure data lake storage gen 2 从子文件夹复制所有文件,将相同的结构移动到存档文件夹并从 Azure 数据工厂中的源删除 - Copy all files from sub folders, move the same structure to archive folder and delete from source in Azure Data Factory 如何使用天蓝色数据工厂复制特定目录中的所有文件和文件夹 - How to copy all files and folders in specific directory using azure data factory Azure数据工厂有多个子文件夹时如何识别和复制最近添加的文件? - How to identify and copy the most recently added files in Azure Data Factory when there are multiple sub-folders? 如何使用Azure Source中的Azure数据工厂基于上次修改时间复制数据? - How to copy the data based on the Last modified time using Azure data factory from FTP Source?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM