[英]Azure Data Factory (ADF) - Parse/Flatten XML file - Get content of all elements match wildcard criteria and sitting in different segments in Hierarchy
I have XML file something like below and need to get all complex elements having different names but all ends with "_KEYS" and they are part of different segments in XML.我有 XML 文件,如下所示,需要获取所有具有不同名称但都以“_KEYS”结尾的复杂元素,它们是 XML 中不同段的一部分。 The sample below has only 3 such elements.. but actual file has in hundreds.下面的示例只有 3 个这样的元素。但实际文件有数百个。 How to get it done in ADF如何在 ADF 中完成
<XMLInput>
<SegmentX>
.
.
<Category_KEYS ID= 'AAAAAAA'> AAAA DESCRIPTION </Category_KEYS>
.
.
</SegmentX>
<SegmentY>
.
.
.
<Staus_KEYS ID= 'BBBBBBB'> BBB DESCRIPTION> </Status_KEYS>
.
.
.
</SegmentY>
<SegmentZ>
.
<Department_KEYS ID= 'CCCCCC'> CCCC DESCRIPTION </Department_KEYS>
.
</SegmentZ>
</XMLInput>
In fact looking for all ID 's and corresponding descriptions .实际上是在寻找所有的ID和相应的描述。
**ID | VALUE**
AAAAAAA | AAAA DESCRIPTION
BBBBBBB | BBB DESCRIPTION
CCCCCC | CCCC DESCRIPTION
If you have a SQL Server or Azure SQL DB then they are both quite capable with XML, using the .nodes
and .value
methods of the xml datatype eg If you have a SQL Server or Azure SQL DB then they are both quite capable with XML, using the .nodes
and .value
methods of the xml datatype eg
DECLARE @xml XML = '<XMLInput>
<SegmentX>
<Category_KEYS ID= "AAAAAAA"> AAAA DESCRIPTION </Category_KEYS>
</SegmentX>
<SegmentY>
<Status_KEYS ID= "BBBBBBB"> BBB DESCRIPTION </Status_KEYS>
</SegmentY>
<SegmentZ>
<Department_KEYS ID= "CCCCCC"> CCCC DESCRIPTION </Department_KEYS>
</SegmentZ>
</XMLInput>';
SELECT
x.c.value('(*/@ID)[1]', 'VARCHAR(20)') id,
x.c.value('(*)[1]', 'VARCHAR(20)') [description]
FROM @xml.nodes('XMLInput/*') x(c);
My results:我的结果:
It is possible to read XML files with Lookup activities but might be awkward to get the table-type result you are after, unless you use a For Each activity which has a limit of 5,000 loops and would be a bad idea where you have many elements to loop through.可以使用 Lookup 活动读取 XML 文件,但可能很难获得您所追求的表类型结果,除非您使用 For Each 活动,该活动限制为 5,000 个循环,并且在您有很多元素的情况下是个坏主意循环通过。 I would pass this off to some compute to handle, whether it be Azure SQL DB like in my example, a Databricks or Synapse notebook or Mapping Data Flows if you need a low-code experience.我会把它交给一些计算来处理,无论是 Azure SQL DB,就像在我的示例中一样,Databricks 或 Synapse notebook 或 Mapping Data Flows,如果您需要低代码体验。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.