简体   繁体   中英

Delete all empty nodes from XML in SQL Server

I want to remove all empty nodes in a XML file. Even if the node is present as

<Node/>    OR    <Node></Node>

node should be deleted from the XML.

<Root type="1">
<A></A>
<B>
    <B1>
        <B12/>
        <B13/>
    </B1>
    <B2>
        123
        <B21></B21>
    </B2>
   <B3 type="3">
       <B4/>
   </B3>
</B>
<C/>
</Root>

Expected output:

<Root type="1">
<B>
    <B2>
        123
    </B2>
    <B3 type="3">
    </B3>
</B>
</Root>

Delete B1 node because all nodes under B1 is empty and also there is no attribute as well.

Do not delete B2 because , B2 has a value 123 , but delete its empty child.

Do not delete B3 because , B3 has an attribute, but delete its empty child.

I am using SQL to do the same , but in case if this can be done in c# as well , I can call C# script from SSIS, but SQL will be preferred.

A way to do in C# would be:

var x = XElement.Parse(@"<Root type=""1"">
                            <A></A>
                            <B>
                                <B1>
                                    <B12/>
                                    <B13/>
                                </B1>
                                <B2>
                                    123
                                    <B21></B21>
                                </B2>
                               <B3 type=""3"">
                                   <B4/>
                               </B3>
                            </B>
                            <C/>
                            </Root>");

foreach(XElement child in x.Descendants().Reverse())
{
    if(!child.HasElements && string.IsNullOrEmpty(child.Value) && !child.HasAttributes) 
        child.Remove();
}

It can be done easily with regular expressions:

string xml = @"<Root type=""1"">
                < A ></ A >
                < B >
                    < B1 >
                        < B12 />
                        < B13 />
                    </ B1 >
                    < B2 >
                        123
                        < B21 ></ B21 >
                    </ B2 >
                   < B3 type = ""3"" >

                        < B4 />

                    </ B3 >
                 </ B >
                 < C />
                 </ Root > ";


xml = Regex.Replace(xml, @"<.+?/>", "");
xml = Regex.Replace(xml, @"<(.+?)>\s*</\1>", "");

The simplest way to do this in SQL Server .

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

I am also able to select all the nodes that I ignored/deleted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM