简体   繁体   中英

How to Split an XML file into multiple XML Files based on nodes

I have an XML file as follows

<?xml version="1.0>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>

  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>

  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

I want to split the xml file int o three. According to its nodes

File 1:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt1</id>
</CustomTextBox>

File 2:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt2</id>
</CustomTextBox>

File 3:

<?xml version="1.0>
<AllControlsCount>
  <Width>0</Width>
  <id>ControlsID</id>
</AllControlsCount>

Also the nodes are dynamic, they may change. How can I split this xml file as multiple according to the nodes. If anybody knows please share.

Try LinqToXml :

var xDoc = XDocument.Parse(Resource1.XMLFile1); // loading source xml
var xmls = xDoc.Root.Elements().ToArray(); // split into elements

for(int i = 0;i< xmls.Length;i++)
{
    // write each element into different file
    using (var file = File.CreateText(string.Format("xml{0}.xml", i + 1)))
    {
        file.Write(xmls[i].ToString());
    }
}

It will take all elements defined inside the root element and write its content into separate files.

With Linq to Xml its even simpler - you can use XElement.Save method to save any element to separate xml file:

XDocument xdoc = XDocument.Load(path_to_xml);
int index = 0;
foreach (var element in xdoc.Root.Elements())
    element.Save(++index + ".xml");

Or one line

XDocument.Load(path_to_xml).Root.Elements()
         .Select((e, i) => new { Element = e, File = ++i + ".xml" })
         .ToList().ForEach(x => x.Element.Save(x.File));

You can use XmlTextReader and XmlWriter classes to accomplish what you wish. But you need to know where you need to start creating new XML files. Looking at your example, you wish to split each node contained in the root node.

That means that once you start reading the XML file, you need to ensure that you are inside of the root node , then you need to follow how deep into the XML you are , so you can close the file when you reach next node in the root node.

See this for example - I read XML from file.xml and open XML writer. When I reach first node contained in the root node, I start writing the elements.

I remember the depth in variable "treeDepth", which represents the XML tree structure depth.

Based on currently read node, I do an action. When I reach the End element that has tree depth 1 , it means I am again in the root node, so I close the current XML file and open new one.

XmlTextReader reader = new XmlTextReader ("file.xml");

XmlWriter writer = XmlWriter.Create("first_file.xml")
writer.WriteStartDocument();

int treeDepth = 0;

while (reader.Read()) 
{
    switch (reader.NodeType) 
    {
        case XmlNodeType.Element:

            //
            // Move to parsing or skip the root node
            //

            if (treeDepth > 0)
                writer.WriteStartElement(reader.Name);

            treeDepth++;


            break;
  case XmlNodeType.Text:

            //
            // Write text here
            //

            writer.WriteElementString (reader.Value);

            break;
  case XmlNodeType.EndElement:

            //
            // Close the end element, open new file
            //

            if (treeDepth == 1)
            {
                writer.WriteEndDocument();
                writer = new XmlWriter("file2.xml");
                writer.WriteStartDocument();
            }

            treeDepth--;

            break;
    }
}

writer.WriteEndDocument();

Note that this code does NOT entirely solve your problem, but merely explains the logic needed to solve it completely.

For more help on XML readers and writers read following links:

http://support.microsoft.com/kb/307548

http://www.dotnetperls.com/xmlwriter

I took Legoless' answer and expanded it to make a version that worked for me and so am sharing it. For my needs, I needed to split upon multiple entries per file, rather than just the single entry per file that is shown in the original question and so that means I needed to it to preserve the higher level elements in order to ensure valid resulting xml files.

So you supply the level you want to split on and the number of entries per file that you want.

public class XMLFileManager
{        

    public List<string> SplitXMLFile(string fileName, int startingLevel, int numEntriesPerFile)
    {
        List<string> resultingFilesList = new List<string>();

        XmlReaderSettings readerSettings = new XmlReaderSettings();
        readerSettings.DtdProcessing = DtdProcessing.Parse;
        XmlReader reader = XmlReader.Create(fileName, readerSettings);

        XmlWriter writer = null;
        int fileNum = 1;
        int entryNum = 0;
        bool writerIsOpen = false;
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.NewLineOnAttributes = true;

        Dictionary<int, XmlNodeItem> higherLevelNodes = new Dictionary<int, XmlNodeItem>();
        int hlnCount = 0;

        string fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
        resultingFilesList.Add(fileIncrementedName);
        writer = XmlWriter.Create(fileIncrementedName, settings);
        writerIsOpen = true;
        writer.WriteStartDocument();

        int treeDepth = 0;

        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:                        

                    treeDepth++;

                    if (treeDepth == startingLevel)
                    {
                        entryNum++;
                        if (entryNum == 1)
                        {                                
                            if (fileNum > 1)
                            {
                                fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
                                resultingFilesList.Add(fileIncrementedName);
                                writer = XmlWriter.Create(fileIncrementedName, settings);
                                writerIsOpen = true;
                                writer.WriteStartDocument();
                                for (int d = 1; d <= higherLevelNodes.Count; d++)
                                {
                                    XmlNodeItem xni = higherLevelNodes[d];
                                    switch (xni.XmlNodeType)
                                    {
                                        case XmlNodeType.Element:
                                            writer.WriteStartElement(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Text:
                                            writer.WriteString(xni.NodeValue);
                                            break;
                                        case XmlNodeType.CDATA:
                                            writer.WriteCData(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Comment:
                                            writer.WriteComment(xni.NodeValue);
                                            break;
                                        case XmlNodeType.EndElement:
                                            writer.WriteEndElement();
                                            break;
                                    }
                                }
                            }
                        }
                    }

                    if (writerIsOpen)
                    {
                        writer.WriteStartElement(reader.Name);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Element;
                        xni.NodeValue = reader.Name;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Text:

                    if (writerIsOpen)
                    {
                        writer.WriteString(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Text;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.CDATA:

                    if (writerIsOpen)
                    {
                        writer.WriteCData(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.CDATA;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Comment:

                    if (writerIsOpen)
                    {
                        writer.WriteComment(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Comment;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.EndElement:

                    if (entryNum == numEntriesPerFile && treeDepth == startingLevel || treeDepth==1)
                    {
                        if (writerIsOpen)
                        {
                            fileNum++;
                            writer.WriteEndDocument();
                            writer.Close();
                            writerIsOpen = false;
                            entryNum = 0;
                        }                            
                    }
                    else
                    {
                        if (writerIsOpen)
                        {
                            writer.WriteEndElement();
                        }

                        if (treeDepth < startingLevel)
                        {
                            hlnCount++;
                            XmlNodeItem xni = new XmlNodeItem();
                            xni.XmlNodeType = XmlNodeType.EndElement;
                            xni.NodeValue = string.Empty;
                            higherLevelNodes.Add(hlnCount, xni);
                        }
                    }

                    treeDepth--;

                    break;
            }
        }

        return resultingFilesList;
    }

    private string GetIncrementedFileName(string fileName, int fileNum)
    {
        return fileName.Replace(".xml", "") + "_" + fileNum + "_" + ".xml";
    }
}

public class XmlNodeItem
{        
    public XmlNodeType XmlNodeType { get; set; }
    public string NodeValue { get; set; }
}

Sample Usage:

int startingLevel = 2; //EMR is level 1, while the entries of CustomTextBox and AllControlsCount 
                       //are at Level 2. The question wants to split on those Level 2 items 
                       //and so this parameter is set to 2.
int numEntriesPerFile = 1;  //Question wants 1 entry per file which will result in 3 files,  
                            //each with one entry.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("before_split.xml", startingLevel, numEntriesPerFile);

Results when used against XML file in the question:

File 1:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>
</EMR>

File 2:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>
</EMR>

File 3:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

Another example with greater depth of levels and showing multiple entries per file:

int startingLevel = 4; //splitting on the 4th level down which is <ITEM>
int numEntriesPerFile = 2;//2 enteries per file. If instead you used 3, then the result 
                          //would be 3 entries in the first file and 1 entry in the second file.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("another_example.xml", startingLevel, numEntriesPerFile);

Original File:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>  
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>      
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

Resulting Files:

First File:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

Second File:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM