简体   繁体   中英

Reading XML File ( File size > 500 MB)

I'm trying to parse large XML file (size near about 600MB) and using

It's taking longer time and finally, the entire process is aborted. The process is ending with an exception.

Message: "Thread is aborted"

Method:

private string ReadXml(XmlTextReader reader, string fileName)
{
    string finalXML = "";
    string s1 = "";
    try
    {
        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element: // The node is an element.
                    s1 += "<" + reader.Name + ">";
                    break;
                case XmlNodeType.Text: //Display the text in each element.
                    s1 += reader.Value;
                    break;
                case XmlNodeType.EndElement: //Display the end of the element.
                    s1 += "</" + reader.Name + ">";
                    break;
            }
            finalXML = s1;
        }
    }
    catch (Exception ex)
    {
       Logger.Logger.LogMessage(ex, "File Processing error: " + fileName);
    }
    reader.Close();
    reader.Dispose();

    return finalXML;
}

And then reading and desalinizing:

string finalXML = string.Empty;
XmlTextReader reader = new XmlTextReader(unzipfile);
finalXML = await ReadXml(reader, fileName);

var xmlremovenamespae = Helper.RemoveAllNamespaces(finalXML);
XmlParseObjectNew.BizData myxml = new XmlParseObjectNew.BizData();

using (StringReader sr = new StringReader(xmlremovenamespae))
 {
       XmlSerializer serializer = new XmlSerializer(typeof(XmlParseObjectNew.BizData));
       myxml = (XmlParseObjectNew.BizData)serializer.Deserialize(sr);
 }

Is there any better way to read & parse large xml file? need a suggestion.

The problem is, as mentioned by Jon Skeet and DiskJunky, that your dataset is simply too large to load into memory and your code not optimized for handling this. Hence why various classes are throwing you an 'out of memory exception'.

First of all, string concatenation. Using simple concatenation (a + b) with multiple strings is usually a bad idea due to the way strings work. I would recommend looking up online how to handle string concatenation effectively (for example, Jon Skeet's Concatenating Strings Efficiently ).

However this is optimization of your code, the main issue is the sheer size of the XML file you are trying to load into memory. To handle large datasets it is usually better if you can 'stream' the data, processing chunks of data instead of the entire file.


As you have not shown an example of your XML, I took the liberty of making a simple example to illustrate what I mean.

Consider you have the following XML:

<root>
   <specialelement>
      <value1>somevalue</value1>
      <value2>somevalue</value2>
   </specialelement>
   <specialelement>
      <value1>someothervalue</value1>
      <value2>someothervalue</value2>
   </specialelement>
   ... 
</root>

Of this XML you want to parse the specialelement into an object, with the following class definition:

[XmlRoot("specialelement")]
public class ExampleClass
{
    [XmlElement(ElementName = "value1")]
    public string Value1 { get; set; }    
    [XmlElement(ElementName = "value2")]
    public string Value2 { get; set; }
}

I'll assume we can process each SpecialElement individually, and define a handler for this as follows:

public void HandleElement(ExampleClass item)
{
    // Process stuff
}

Now we can use the XmlTextReader to read each element in the XML individually, when we reach our specialelement we keep track of the data that is contained within the XML element. When we reach the end of our specialelement we deserialize it into an object and send it to our handler for processing. For example:

using (var reader = new XmlTextReader( /* your inputstream */ ))
{
    // Buffer for the element contents
    StringBuilder sb = new StringBuilder(1000);

    // Read till next node
    while (reader.Read())
    {
        switch (reader.NodeType)
        {
            case XmlNodeType.Element: 
                // Clear the stringbuilder when we start with our element 
                if (string.Equals(reader.Name, "specialelement"))
                {
                    sb.Clear();
                }

                // Append current element without namespace
                sb.Append("<").Append(reader.Name).Append(">");
                break;

            case XmlNodeType.Text: //Display the text in each element.
                sb.Append(reader.Value);
                break;

            case XmlNodeType.EndElement: 

                // Append the closure element
                sb.Append("</").Append(reader.Name).Append(">");

                // Check if we have finished reading our element
                if (string.Equals(reader.Name, "specialelement"))
                {
                    // The stringbuilder now contains the entire 'SpecialElement' part
                    using (TextReader textReader = new StringReader(sb.ToString()))
                    {
                        // Deserialize
                        var deserializedElement = (ExampleClass)serializer.Deserialize(textReader);
                        // Send to handler
                        HandleElement(deserializedElement);
                    }
                }

                break;
        }
    }
}

As we start processing the data as it comes in from the stream, we do not have to load the entire file into memory. Keeping the memory usage of the program low (preventing out-of-memory exceptions).

Checkout this fiddle to see it in action.

Note that this a quick example, there are still plenty of places where you can improve and optimize this code further.

I try this and working fine.

fileName = "your file path";

Try this code ,its parsing greater than 500MB XML file within few second.

using (TextReader textReader = new StreamReader(fileName))
  {
    using (XmlTextReader reader = new XmlTextReader(textReader))
      {                                   
       reader.Namespaces = false;
 XmlSerializer serializer = new XmlSerializer(typeof("YourXmlClassType"));
          parseData = ("YourXmlClassType")serializer.Deserialize(reader);
      }
  }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM