如何从 Azure 数据工厂中的 JSON 获取数组？

Question

My actual (not properly working) setup has two pipelines:我的实际（工作不正常）设置有两个管道：

Get API data to lake: for each row in metadata table in SQL calling the REST API and copy the reply (json-files) to the Blob datalake.将 API 数据获取到湖：对于 SQL 中元数据表中的每一行，调用 REST API 并将回复（json 文件）复制到 Blob 数据湖。
Copy data from the lake to SQL: For Each file auto create table in SQL.将数据从湖复制到 SQL：在 SQL 中为每个文件自动创建表。

The result is the correct number of tables in SQL.结果是 SQL 中正确的表数。 Only the content of the tables is not what I hoped for.只有表格的内容不是我所希望的。 They all contain 1 column named odata.metadata and 1 entry, the link to the metadata.它们都包含 1 个名为 odata.metadata 的列和 1 个条目，即元数据的链接。 If I manually remove the metadata from the JSON in the datalake and then run the second pipeline, the SQL table is what I want to have.如果我从数据湖中的 JSON 中手动删除元数据，然后运行第二个管道，则 SQL 表就是我想要的。

Have:有：

{  "odata.metadata":"https://test.com", 
"value":[
{
  "Key":"12345",
"Title":"Name",
"Status":"Test"
}]}

Want:想：

[{
"Key":"12345",
"Title":"Name",
"Status":"Test"
}]

I tried to add $.['value'] in the API call.我试图在 API 调用中添加 $.['value'] 。 The result then was no odata.metadata line, but the array started with {value: which resulted in an error copying to SQL结果是没有 odata.metadata 行，但数组以 {value: 开头，这导致复制到 SQL 时出错

I also tried to use mapping (in sink) to SQL.我还尝试使用映射（在接收器中）到 SQL。 That gives the wanted result for the dataset I manually specified the mapping for, but only goes well for the dataset with the same number of column in the array.这给出了我手动指定映射的数据集的所需结果，但仅适用于数组中具有相同列数的数据集。 I don't want to manually do the mapping for 170 calls...我不想手动为 170 个电话做映射...

Does anyone know how handle this in ADF?有谁知道如何在 ADF 中处理这个问题？ For now I feel like the only solution is to add a Python step in the pipeline, but I hope for a somewhat standard ADF way to do this!现在我觉得唯一的解决方案是在管道中添加一个 Python 步骤，但我希望有一种有点标准的 ADF 方式来做到这一点！

Answer 1

You can add another pipeline with dataflow to remove the content from JSON file before copying data to SQL, using flatten formatters.在将数据复制到 SQL 之前，您可以使用扁平格式化程序添加另一个带有数据流的管道以从 JSON 文件中删除内容。

Before flattening the JSON file :在展平 JSON 文件之前：

This is what I see when JSON data copied to SQL database without flattening:这是我在将 JSON 数据复制到 SQL 数据库而不展平时所看到的：

After flattening the JSON file :展平 JSON 文件后：

Added a pipeline with dataflow to flatten the JSON file to remove 'odata.metadata' content from the array.添加了一个带有数据流的管道以flatten JSON 文件以从数组中删除“odata.metadata”内容。

Source preview :源码预览：

Flatten formatter :展平格式化程序：

Select the required object from the Input array从输入数组中选择所需的对象

After selecting value object from input array, you can see only the values under value in Flatten formatter preview.从输入数组中选择值对象后，您只能在 Flatten 格式化程序预览中看到 value 下的值。

Sink preview :水槽预览：

File generated after flattening.展平后生成的文件。

Copy the generated file as Input to SQL.将生成的文件作为输入复制到 SQL。

Note : If your Input file schema is not constant, you can enable Allow schema drift to allow schema changes注意：如果您的输入文件架构不是恒定的，您可以启用Allow schema drift以允许架构更改

Reference: Schema drift in mapping data flow参考：映射数据流中的模式漂移

如何从 Azure 数据工厂中的 JSON 获取数组？

问题描述

1 个解决方案

解决方案1
0 2021-07-23 08:47:25

如何从 Azure 数据工厂中的 JSON 获取数组？

问题描述

1 个解决方案

解决方案1 0 2021-07-23 08:47:25

解决方案1
0 2021-07-23 08:47:25