简体繁体 English

使用数据工厂附加到 azure 数据湖中的文件

[英]appending to a file in azure data lake using data factory

原文 2021-03-25 17:13:34 3 1 azure/ azure-data-factory/ azure-data-factory-2/ azure-data-lake-gen2

I am running into issues while appending data to a file in azure data lake, using data factory.我在使用数据工厂将数据附加到 azure 数据湖中的文件时遇到问题。 I am trying to get data from MS Graph API and I am able to get the data and copy it for Single API calls where I push the response to the data lake, using "Copy Data" functionality, but if I want to do multiple calls where I get a response, and then append these responses to a single file, I am not sure how to do that.我正在尝试从 MS Graph API 获取数据，并且我能够获取数据并将其复制到单个 API 调用中，在该调用中我使用“复制数据”功能将响应推送到数据湖，但是如果我想进行多次调用我得到一个响应，然后 append 这些响应到一个文件，我不知道该怎么做。 I don't think the "Copy data" activity is the right action for it.我认为“复制数据”活动不适合它。

One Example: API to get all groups in tenant:一个例子： API 获取租户中的所有组：

https://graph.microsoft.com/v1.0/Groups/ https://graph.microsoft.com/v1.0/Groups/

API to get all members associated with a group: API 获取与组关联的所有成员：

https://graph.microsoft.com/v1.0/groups/"GroupID"/ownershttps://graph.microsoft.com/v1.0/groups/"GroupID"/owners

"Group ID" comes from the top API call. “组 ID”来自顶部 API 调用。 ** **

I am able to build a loop and do the calls correctly.我能够建立一个循环并正确地进行调用。 It is just appending the results of the second call where I am at a loss.它只是在我不知所措的地方附加第二次通话的结果。 I don't think creating a new file for each group would be the right approach.我不认为为每个组创建一个新文件是正确的方法。

1 个解决方案

I think you have a couple issues to contend with.我认为你有几个问题需要解决。 First, standard Blobs don't support append operations.首先，标准 Blob 不支持 append 操作。 For that you'll need an AppendBlob.为此，您需要一个 AppendBlob。 The second problem is that ADF doesn't support AppendBlob.第二个问题是 ADF 不支持 AppendBlob。

Here is a question where I discuss Copy activity and AppendBlob.这是我讨论复制活动和 AppendBlob 的问题。

Here is another answer (not mine) with an interesting approach that uses the native REST API to append the blob.这是另一个答案（不是我的），它使用了一种有趣的方法，使用原生 REST API 到 append blob。

Another option would be to let the process create a new file per run.另一种选择是让进程在每次运行时创建一个新文件。 After they've all been created, you could use DataFlow to collapse them into a single file.全部创建后，您可以使用 DataFlow 将它们折叠成一个文件。