简体   繁体   中英

Azure Data Factory (ADF) - Parse/Flatten XML file - Get content of all elements match wildcard criteria and sitting in different segments in Hierarchy

I have XML file something like below and need to get all complex elements having different names but all ends with "_KEYS" and they are part of different segments in XML. The sample below has only 3 such elements.. but actual file has in hundreds. How to get it done in ADF

<XMLInput> 
  <SegmentX> 
     .
     .  
     <Category_KEYS  ID= 'AAAAAAA'> AAAA DESCRIPTION </Category_KEYS> 
     .
     .        
  </SegmentX>
 
  <SegmentY> 
     .
     .  
     .
     <Staus_KEYS  ID= 'BBBBBBB'> BBB DESCRIPTION> </Status_KEYS> 
     .
     .  
     .
  </SegmentY>
 
  <SegmentZ> 
   .
     <Department_KEYS  ID= 'CCCCCC'> CCCC DESCRIPTION </Department_KEYS> 
   .     
  </SegmentZ> 
</XMLInput>

In fact looking for all ID 's and corresponding descriptions .

  **ID | VALUE** 
 AAAAAAA | AAAA DESCRIPTION 
 BBBBBBB | BBB DESCRIPTION 
 CCCCCC  | CCCC DESCRIPTION

If you have a SQL Server or Azure SQL DB then they are both quite capable with XML, using the .nodes and .value methods of the xml datatype eg

DECLARE @xml XML = '<XMLInput> 
  <SegmentX> 
     <Category_KEYS  ID= "AAAAAAA"> AAAA DESCRIPTION </Category_KEYS> 
  </SegmentX>
 
  <SegmentY> 
     <Status_KEYS  ID= "BBBBBBB"> BBB DESCRIPTION </Status_KEYS> 
  </SegmentY>
 
  <SegmentZ> 
     <Department_KEYS  ID= "CCCCCC"> CCCC DESCRIPTION </Department_KEYS> 
  </SegmentZ> 
</XMLInput>';


SELECT
    x.c.value('(*/@ID)[1]', 'VARCHAR(20)') id,
    x.c.value('(*)[1]', 'VARCHAR(20)') [description]
FROM @xml.nodes('XMLInput/*') x(c);

My results:

我的结果

It is possible to read XML files with Lookup activities but might be awkward to get the table-type result you are after, unless you use a For Each activity which has a limit of 5,000 loops and would be a bad idea where you have many elements to loop through. I would pass this off to some compute to handle, whether it be Azure SQL DB like in my example, a Databricks or Synapse notebook or Mapping Data Flows if you need a low-code experience.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM