简体   繁体   中英

Approach to process huge xml files in C#

Can someone please guide me with this problem?

In my institution, we process xml files of huge size(max 1 GB) and insert the details into a database table. Per current design, we are parsing xml file with XmlReader and form a xml string with required data, which will then be passed into a stored procedure (xml data type) to insert the details into db.

Now the problem is we are not sure if there would be a better approach other than this ? so please suggest if are any new features available with .Net 3.5 and/or sql server 2005 to handle this in a way better than our approach.

Any help in this reagrd would be highly appreciated.

Thanks.

Do you care at all what is in the XML-file? If not, you can just use a StreamReader and get the text from the XML and just pass it along to the database.

If you need to validate that the XML is correct, it is a good idea to use XmlReader .

However, just dumping 1GB of XML into your database seems a bit weird, what is the purpose of this XML data? Is it a lot of nested elements? Maybe you could de-serialize it and store each object in the appropriet table instead, which would imo lead to a easier understandable design.

There are a couple of things you can think of to make the design of your software easier/better:

  • Does more than one XML file occure in the database at once?
  • How is the data shared between applications?
  • Have you considered using MemoryMappedFile ?
  • Is it possible to de-serialize the XML into entities instead and store them approprietly?

I suspect that if there are any performance issues it will be with the stored procedure and the database side of things rather that reading the file.

Why are you storing the XML file in a database table? I would suggest using a different solution would be appropriate, but without knowing more details about exactly what it is you are trying to do it is hard to advise.

If each first-level element in the xml is a record, ie

<rootNode>
    <row>...</row>
    <row>...</row>
    <row>...</row>
</rootNode>

Then you could create an IDataReader implemention that reads the xml (via XmlReader ) and presents each as a record, to be imported using SqlBulkCopy . Pretty much like my old answer here .

Advantages:

  • SqlBulkCopy is the fastest way to get data into a database
  • stripping it into records makes appropriate use of a database, allowing indexing and proper typing
  • it doesn't rely on a huge BLOB going over the wire in an atomic way (necessary for the xml data type)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM