[英]Azure Data Factory- Copy specific files from multiple Parent folders from FTP Server
I am trying to copy the .ZIP files from FTP Server to Azure DataLake. 我试图将.ZIP文件从FTP服务器复制到Azure DataLake。 I need to copy specific files from specific parent folders(Totally i have 6 parent folders in the FTP)and this pipeline needs to scheduled. 我需要从特定的父文件夹中复制特定文件(总共我在FTP中有6个父文件夹),并且需要安排此管道。 So how should i provide the parameters such that Pipeline should select only the specific files from the different folders? 那么我应该如何提供参数,以便管道应该只选择不同文件夹中的特定文件?
I have used Metadata Activity and tried creating pipelines but not sure how to provide the pipeline to pick only specific files! 我已经使用了元数据活动并试图创建管道,但不知道如何提供管道来只选择特定的文件!
Azure Data Factory supports compress/decompress data during copy. Azure Data Factory支持在复制期间压缩/解压缩数据。 When you specify compression property in an input dataset, the copy activity read the compressed data from the source and decompress it; 在输入数据集中指定压缩属性时,复制活动从源读取压缩数据并解压缩; and when you specify the property in an output dataset, the copy activity compress then write data to the sink. 当您在输出数据集中指定属性时,复制活动压缩然后将数据写入接收器。
For example: 例如:
Read .zip file from FTP server, decompress it to get the files inside, and land those files in Azure Data Lake Store. 从FTP服务器读取.zip文件,解压缩以获取文件,并将这些文件放在Azure Data Lake Store中。 You define an input FTP dataset with the compression type property as ZipDeflate. 您将压缩类型属性定义为ZipDeflate的输入FTP数据集。
For more details, please reference: Compression support . 有关详细信息,请参阅: 压缩支持 。
Here's the tutorial about Copy data from FTP server by using Azure Data Factory . 以下是使用Azure Data Factory从FTP服务器复制数据的教程。
Other format dataset To copy data from FTP in ORC/Avro/JSON/Binary format, the following properties are supported in this link: Other format dataset . 其他格式数据集要以ORC / Avro / JSON /二进制格式从FTP复制数据,此链接支持以下属性: 其他格式数据集 。
Tips: 提示:
Hope this helps. 希望这可以帮助。
You'll need to use the filter activity to filter only the folders / files that you need. 您需要使用过滤器活动来仅过滤所需的文件夹/文件。 I think you'll need 2 loops: 我想你需要2个循环:
Loop 1: get metadata of folders -> Filter required folders -> foreach pipeline with loop 2 Loop 2: get meta data of files of files -> Filter required files -> copy required files 循环1:获取文件夹的元数据 - >过滤所需的文件夹 - >带循环2的foreach管道循环2:获取文件文件的元数据 - >过滤所需文件 - >复制所需文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.