简体繁体 English

Data Factory v2-每行生成一个json文件

[英]Data Factory v2 - Generate a json file per row

原文 2018-03-29 11:21:34 2 1 json/ azure-data-factory-2

I'm using Data Factory v2. 我正在使用Data Factory v2。 I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. 我有一个复制活动，该活动具有一个Azure SQL数据集作为输入和一个Azure Storage Blob作为输出。 I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this. 我想将SQL数据集中的每一行写为一个单独的Blob，但是我看不到如何做到这一点。

I see a copyBehavior in the copy activity, but that only works from a file based source. 我在复制活动中看到了一个copyBehavior，但这仅可用于基于文件的源。

Another possible setting is the filePattern in my dataset: 另一个可能的设置是我的数据集中的filePattern：

Indicate the pattern of data stored in each JSON file. 指示每个JSON文件中存储的数据模式。 Allowed values are: setOfObjects and arrayOfObjects. 允许的值为：setOfObjects和arrayOfObjects。

setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. setOfObjects-每个文件包含单个对象，或以行分隔/串联的多个对象。 When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited). 在输出数据集中选择此选项后，复制活动将生成一个JSON文件，每行每行都有一个对象（以行分隔）。

arrayOfObjects - Each file contains an array of objects. arrayOfObjects-每个文件包含一个对象数组。

The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. 该描述讨论的是“每个文件”，因此最初我认为这是可能的，但是现在我已经对其进行了测试，似乎setOfObjects创建了一个行分隔的文件，其中每一行都写入新行。 The setOfObjects setting creates a file with a json array and adds each line as a new element of the array. setOfObjects设置使用json数组创建一个文件，并将每一行添加为数组的新元素。

I'm wondering if I'm missing a configuration somewhere, or is it just not possible? 我想知道我是否在某处缺少配置，还是不可能？

1 个解决方案

What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. 我现在要做的是将行加载到SQL表中，并为表中的每个记录运行一个foreach。 The I use a Lookup activity to have an array to loop in a Foreach activity. 我使用Lookup活动使数组在Foreach活动中循环。 The foreach activity writes each row to a blob store. foreach活动将每一行写入Blob存储。

For Olga's documentDb question, it would look like this: 对于Olga的documentDb问题，它看起来像这样：

In the lookup, you get a list of the documentid's you want to copy: 在查找中，您将获得要复制的文档ID的列表：

You use that set in your foreach activity 您在foreach活动中使用该设置

Then you copy the files using a copy activity within the foreach activity. 然后，您可以使用foreach活动中的复制活动来复制文件。 You query a single document in your source: 您在源中查询单个文档：

And you can use the id to dynamically name your file in the sink. 您可以使用ID在接收器中动态命名文件。 (you'll have to define the param in your dataset too): （您还必须在数据集中定义参数）：