简体   繁体   English

使用数据工厂附加到 azure 数据湖中的文件

[英]appending to a file in azure data lake using data factory

I am running into issues while appending data to a file in azure data lake, using data factory.我在使用数据工厂将数据附加到 azure 数据湖中的文件时遇到问题。 I am trying to get data from MS Graph API and I am able to get the data and copy it for Single API calls where I push the response to the data lake, using "Copy Data" functionality, but if I want to do multiple calls where I get a response, and then append these responses to a single file, I am not sure how to do that.我正在尝试从 MS Graph API 获取数据,并且我能够获取数据并将其复制到单个 API 调用中,在该调用中我使用“复制数据”功能将响应推送到数据湖,但是如果我想进行多次调用我得到一个响应,然后 append 这些响应到一个文件,我不知道该怎么做。 I don't think the "Copy data" activity is the right action for it.我认为“复制数据”活动不适合它。

One Example: API to get all groups in tenant:一个例子: API 获取租户中的所有组:

https://graph.microsoft.com/v1.0/Groups/ https://graph.microsoft.com/v1.0/Groups/

API to get all members associated with a group: API 获取与组关联的所有成员:

https://graph.microsoft.com/v1.0/groups/"GroupID"/ownershttps://graph.microsoft.com/v1.0/groups/"GroupID"/owners

"Group ID" comes from the top API call. “组 ID”来自顶部 API 调用。 ** **

I am able to build a loop and do the calls correctly.我能够建立一个循环并正确地进行调用。 It is just appending the results of the second call where I am at a loss.它只是在我不知所措的地方附加第二次通话的结果。 I don't think creating a new file for each group would be the right approach.我不认为为每个组创建一个新文件是正确的方法。

I think you have a couple issues to contend with.我认为你有几个问题需要解决。 First, standard Blobs don't support append operations.首先,标准 Blob 不支持 append 操作。 For that you'll need an AppendBlob.为此,您需要一个 AppendBlob。 The second problem is that ADF doesn't support AppendBlob.第二个问题是 ADF 不支持 AppendBlob。

Here is a question where I discuss Copy activity and AppendBlob.这是我讨论复制活动和 AppendBlob 的问题

Here is another answer (not mine) with an interesting approach that uses the native REST API to append the blob.这是另一个答案(不是我的),它使用了一种有趣的方法,使用原生 REST API 到 append blob。

Another option would be to let the process create a new file per run.另一种选择是让进程在每次运行时创建一个新文件。 After they've all been created, you could use DataFlow to collapse them into a single file.全部创建后,您可以使用 DataFlow 将它们折叠成一个文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Azure Data Factory将数据从Data Lake Store(JSON文件)移动到Azure搜索 - Move data from Data Lake Store (JSON file ) to Azure Search using Azure Data Factory 如何使用 Azure 数据工厂将新文件或更新文件从 Azure 数据湖推送到文件夹 - How to push new or updated files from Azure Data lake to File folder using Azure Data Factory Azure 数据工厂:数据湖访问权限 - Azure Data Factory: Data Lake Access Permissions 如何使用 Azure 数据工厂将文件从 Azure 数据湖传输到网络文件夹(远程服务器文件夹) - How to transfer files from Azure Data Lake to Network File Folder(Remote server folder) using Azure Data Factory 通过 Azure 数据工厂将数据复制到 Azure 数据湖之前的文件检查 - file checks before copying data to Azure data lake through Azure data factory 使用Azure Data Factory将数据从SAP BW复制到Azure Data Lake Store - Copying Data from SAP BW to Azure Data Lake Store using Azure Data Factory 使用Azure Data Factory将HTTP端点中的数据加载到Azure Data Lake中 - Data from HTTP endpoint to be loaded into Azure Data Lake using Azure Data Factory 无法使用数据工厂管道将数据从 azure 数据湖 gen2 复制到 azure sql db - Cannot copy data from azure data lake gen2 to azure sql db using data factory pipeline 使用 Azure 数据工厂将数据从 Azure Data Lake 复制到 SnowFlake,无需阶段 - Copy Data from Azure Data Lake to SnowFlake without stage using Azure Data Factory 使用Azure数据工厂(ADF)仅从Azure Data Lake存储中复制最新文件 - Copy only the latest file from azure data lake store with Azure Data Factory (ADF)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM