简体   繁体   English

将 XML 压缩文件从 HTTP 链接源复制并提取到 Azure 使用 Z3A5805BC0FZ63F 工厂数据存储的 Blob 存储

[英]Copy and Extracting Zipped XML files from HTTP Link Source to Azure Blob Storage using Azure Data Factory

I am trying to establish an Azure Data Factory copy data pipeline.我正在尝试建立 Azure 数据工厂复制数据管道。 The source is an open HTTP Linked Source (Url reference: https://clinicaltrials.gov/AllPublicXML.zip ).源是一个开放的 HTTP 链接源(Url 参考: https://clinicaltrials.gov/AllPublicXML.zip )。 So basically the source contains a zipped folder having many XML files.所以基本上源包含一个压缩文件夹,其中包含许多 XML 文件。 I want to unzip and save the extracted XML files in Azure Blob Storage using Azure Data Factory.我想使用 Azure 数据工厂将提取的 XML 文件解压缩并保存在 Azure Blob 存储中。 I was trying to follow the configurations mentioned here: How to decompress a zip file in Azure Data Factory v2 but I am getting the following error:我试图遵循此处提到的配置: How to decompress a zip file in Azure Data Factory v2但我收到以下错误:

ErrorCode=UserErrorSourceNotSeekable,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Your HttpServer source can't support random read which is requied by current copy activity setting, please create two copy activities to work around it: the first copy activity binary copy your HttpServer source to a staging file store(like Azure Blob, Azure Data Lake, File, etc.), second copy activity copy from the staged file store to your destination with current settings.,Source=Microsoft.DataTransfer.ClientLibrary,'

Not exactly sure what is going wrong, but it would be really helpful if someone can guide me with the procedure.不完全确定出了什么问题,但是如果有人可以指导我进行该程序,那将非常有帮助。

I broke this up in to two Copy data activities in order to separate the donwloading of the zip file (which is quite large) and the unpacking.我将其分解为两个复制数据活动,以便将 zip 文件(非常大)的下载和解包分开。 You could try and do them in one step but I think you're going to run into timeout issues.您可以尝试一步完成,但我认为您会遇到超时问题。 With my approach you also have a copy of the original zip file which would be good for audit trail and debugging purposes.使用我的方法,您还可以获得原始 zip 文件的副本,这将有利于审计跟踪和调试目的。

I try and document my ADF patterns in a boxes and lines format which shows the key details for each component.我尝试以方框和线条格式记录我的 ADF 模式,其中显示了每个组件的关键细节。 So here there are two Copy activities, and the supporting linked services and datasets - try and follow this, let me know how you get on:所以这里有两个复制活动,以及支持的链接服务和数据集 - 尝试遵循这个,让我知道你的进展情况:

ADF 模式

NB it took quite a long time for ADF to unpack the.xml files as there are rather a lot of them.请注意,ADF 需要很长时间才能解压缩 .xml 文件,因为它们的数量相当多。 My results showing in Azure Storage Explorer:我在 Azure 存储资源管理器中显示的结果:

结果

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Odata 链接服务将文件从共享点复制到 blob 存储 azure 数据工厂 v2 - How to copy files from sharepoint into blob storage azure data factory v2 using Odata linked service 使用数据工厂创建一个管道,将活动从天蓝色的Blob存储复制到数据湖存储 - create a pipeline using data factory with copy activity from azure blob storage to data lake store Azure 数据工厂使用 XML 源复制数据 - Azure Data Factory Copy Data using XML Source Map Azure Blob 存储文件到 Azure 数据工厂中的自定义活动 - Map Azure Blob Storage files to a custom activity in Azure Data Factory 使用 Python 在内存中提取多个压缩 JSON 文件并将它们保存到 Azure Blob 存储 - Extracting multiple zipped JSON files in-memory and saving them to Azure Blob Storage with Python 从Azure数据工厂访问Azure Blob存储帐户 - Access Azure Blob storage account from azure data factory 如何使用 Azure 数据工厂更改 Azure blob 存储中文件的内容类型? - How to change the content-type of the files in Azure blob Storage using Azure Data Factory? Azure数据工厂-从Azure Blob存储读取文件夹中的所有文件时记录文件名 - Azure Data Factory - Recording file name when reading all files in folder from Azure Blob Storage Azure数据工厂仅从Blob存储中检索新的Blob文件 - Azure Data Factory Only Retrieve New Blob files from Blob Storage Azure 数据工厂到 Azure Blob 存储权限 - Azure Data Factory to Azure Blob Storage Permissions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM