简体   繁体   English

Data Factory v2-每行生成一个json文件

[英]Data Factory v2 - Generate a json file per row

I'm using Data Factory v2. 我正在使用Data Factory v2。 I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. 我有一个复制活动,该活动具有一个Azure SQL数据集作为输入和一个Azure Storage Blob作为输出。 I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this. 我想将SQL数据集中的每一行写为一个单独的Blob,但是我看不到如何做到这一点。

I see a copyBehavior in the copy activity, but that only works from a file based source. 我在复制活动中看到了一个copyBehavior,但这仅可用于基于文件的源。

Another possible setting is the filePattern in my dataset: 另一个可能的设置是我的数据集中的filePattern:

Indicate the pattern of data stored in each JSON file. 指示每个JSON文件中存储的数据模式。 Allowed values are: setOfObjects and arrayOfObjects. 允许的值为:setOfObjects和arrayOfObjects。

setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. setOfObjects-每个文件包含单个对象,或以行分隔/串联的多个对象。 When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited). 在输出数据集中选择此选项后,复制活动将生成一个JSON文件,每行每行都有一个对象(以行分隔)。

arrayOfObjects - Each file contains an array of objects. arrayOfObjects-每个文件包含一个对象数组。

The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. 该描述讨论的是“每个文件”,因此最初我认为这是可能的,但是现在我已经对其进行了测试,似乎setOfObjects创建了一个行分隔的文件,其中每一行都写入新行。 The setOfObjects setting creates a file with a json array and adds each line as a new element of the array. setOfObjects设置使用json数组创建一个文件,并将每一行添加为数组的新元素。

I'm wondering if I'm missing a configuration somewhere, or is it just not possible? 我想知道我是否在某处缺少配置,还是不可能?

What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. 我现在要做的是将行加载到SQL表中,并为表中的每个记录运行一个foreach。 The I use a Lookup activity to have an array to loop in a Foreach activity. 我使用Lookup活动使数组在Foreach活动中循环。 The foreach activity writes each row to a blob store. foreach活动将每一行写入Blob存储。

For Olga's documentDb question, it would look like this: 对于Olga的documentDb问题,它看起来像这样: 管道

In the lookup, you get a list of the documentid's you want to copy: 在查找中,您将获得要复制的文档ID的列表: 在此处输入图片说明

You use that set in your foreach activity 您在foreach活动中使用该设置 在此处输入图片说明

Then you copy the files using a copy activity within the foreach activity. 然后,您可以使用foreach活动中的复制活动来复制文件。 You query a single document in your source: 您在源中查询单个文档: 在此处输入图片说明

And you can use the id to dynamically name your file in the sink. 您可以使用ID在接收器中动态命名文件。 (you'll have to define the param in your dataset too): (您还必须在数据集中定义参数): 在此处输入图片说明 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM