I have a huge number (400 000) of big XML (200 to 4000 rows with 40 parent-child relationships). I would like to parse them all and gather all the nodes that exist in them.
with a XML like
<tag1>
<tag2>
<tag3>Content3</tag3>
</tag2>
<tag2>
<tag4>Content4</tag4>
</tag2>
<tag2>
<tag4>Content4</tag4>
</tag2>
<tag2>
<tag5><tag6>Content6</tag6></tag5>
</tag2>
</tag1>
I would like to get
tag1
tag1>tag2
tag1>tag2>tag3
tag1>tag2
tag1>tag2>tag4
tag1>tag2
tag1>tag2>tag4
tag1>tag2
tag1>tag2>tag5
tag1>tag2>tag5>tag6
or at least (leaf removed):
tag1
tag1>tag2
tag1>tag2
tag1>tag2
tag1>tag2
tag1>tag2>tag5
Because my real goal is to check the nodes, which are modeled as tables in the target database.
Output can be a query result, a table or a file, I don't mind.
The final objective is to use this data to check if SSIS, who is used to load XML content into a database, has not missed any node. In fact we KNOW it has missed somes so now we must find which ones.
I have checked the SQL Server 2012 features but I have 2 issues: - it doesn't give me any pointer on the performance with FILES. I need the fastest way when I use files, not when I use XML content in a string - it's a bit cumbersome
I have done a solution of my own with Qlikview which checks if the possible nodes (I have the XSD) are in the XML and output the result in a file, which is fine, but too slow (1 to 2s per XML, too long).
Thanks guys !
I was looking for not answered tsql/xml questions and found yours. It made me curious, don't know if this is of any need today, but this was my suggestion:
It will work for any XML down to any depth...
I must admit, that I normally do not use CURSORs, but in this case I did not find another approach. If you don't mind it would be nice to test its speed and place a short answer - just for curiousity :-)
DECLARE @x XML=
'<tag1>
<tag2>
<tag3>Content3</tag3>
</tag2>
<tag2>
<tag4>Content4</tag4>
</tag2>
<tag2>
<tag4>Content4</tag4>
</tag2>
<tag2>
<tag5>
<tag6>Content6</tag6>
</tag5>
</tag2>
</tag1>';
CREATE TABLE #HelpTable(NodeIndex INT UNIQUE,NextNodeName VARCHAR(100),HasChildren BIT);
CREATE TABLE #FinalTags(ID INT IDENTITY,TagNames VARCHAR(1000));
WITH RootNode AS
(
SELECT RN.value('local-name(.)','varchar(100)') AS RN_Name
,RN.query('.') AS RN_Node
FROM @x.nodes('*') AS The(RN)
)
,AnalyzeNodes AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * 10 AS NodeIndex
,RN_Name
,TheNext.Nodes.value('local-name(.)','varchar(100)') AS NextNodeName
,CASE WHEN TheNext.Nodes.value('count(./*)','int')=0 THEN 0 ELSE 1 END AS HasChildren
FROM RootNode
CROSS APPLY RN_Node.nodes('//*') AS TheNext(Nodes)
)
INSERT INTO #HelpTable
SELECT AnalyzeNodes.NodeIndex,AnalyzeNodes.NextNodeName,AnalyzeNodes.HasChildren
FROM AnalyzeNodes
UNION ALL
SELECT an.NodeIndex+1,RN_Name,1
FROM AnalyzeNodes AS an
WHERE an.HasChildren=0
DECLARE @collect VARCHAR(1000)='';
DECLARE @tag VARCHAR(100);
DECLARE @children BIT;
DECLARE cur CURSOR FAST_FORWARD
FOR
SELECT NextNodeName,HasChildren
FROM #HelpTable
ORDER BY NodeIndex;
OPEN cur;
FETCH NEXT FROM cur INTO @tag,@children
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO #FinalTags VALUES(@collect + '>' + @tag);
IF @children=0
SET @collect='';
ELSE
SET @collect=@collect + '>' + @tag;
FETCH NEXT FROM cur INTO @tag,@children
END
CLOSE cur;
DEALLOCATE cur;
SELECT SUBSTRING(TagNames,2,1000) AS TagNames
FROM #FinalTags
WHERE ID=1 OR TagNames<>(SELECT ft.TagNames FROM #FinalTags AS ft WHERE ft.ID=1)
ORDER BY ID,TagNames;
DROP TABLE #FinalTags;
DROP TABLE #HelpTable;
The result:
tag1
tag1>tag2
tag1>tag2>tag3
tag1>tag2
tag1>tag2>tag4
tag1>tag2
tag1>tag2>tag4
tag1>tag2
tag1>tag2>tag5
tag1>tag2>tag5>tag6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.