简体   繁体   English

在C#中处理大型xml文件的方法

[英]Approach to process huge xml files in C#

Can someone please guide me with this problem? 有人可以指导我这个问题吗?

In my institution, we process xml files of huge size(max 1 GB) and insert the details into a database table. 在我的机构中,我们处理大尺寸(最大1 GB)的xml文件并将详细信息插入数据库表。 Per current design, we are parsing xml file with XmlReader and form a xml string with required data, which will then be passed into a stored procedure (xml data type) to insert the details into db. 根据当前的设计,我们使用XmlReader解析xml文件并形成包含所需数据的xml字符串,然后将其传递到存储过程(xml数据类型)以将详细信息插入到db中。

Now the problem is we are not sure if there would be a better approach other than this ? 现在问题是我们不确定除此之外是否会有更好的方法? so please suggest if are any new features available with .Net 3.5 and/or sql server 2005 to handle this in a way better than our approach. 所以请建议.Net 3.5和/或sql server 2005是否有任何新功能以比我们的方法更好的方式处理这个问题。

Any help in this reagrd would be highly appreciated. 任何有关此reagrd的帮助都将受到高度赞赏。

Thanks. 谢谢。

Do you care at all what is in the XML-file? 你关心XML文件中的所有内容吗? If not, you can just use a StreamReader and get the text from the XML and just pass it along to the database. 如果没有,您可以使用StreamReader并从XML获取文本并将其传递给数据库。

If you need to validate that the XML is correct, it is a good idea to use XmlReader . 如果您需要验证XML是否正确,最好使用XmlReader

However, just dumping 1GB of XML into your database seems a bit weird, what is the purpose of this XML data? 但是,将1GB的XML转储到数据库中似乎有点奇怪,这个XML数据的目的是什么? Is it a lot of nested elements? 它有很多嵌套元素吗? Maybe you could de-serialize it and store each object in the appropriet table instead, which would imo lead to a easier understandable design. 也许你可以对它进行反序列化并将每个对象存储在appropriet表中,这样就可以实现更容易理解的设计。

There are a couple of things you can think of to make the design of your software easier/better: 您可以考虑使用一些方法来简化/更好地设计软件:

  • Does more than one XML file occure in the database at once? 是否一次在数据库中出现多个XML文件?
  • How is the data shared between applications? 应用程序之间如何共享数据?
  • Have you considered using MemoryMappedFile ? 你考虑过使用MemoryMappedFile吗?
  • Is it possible to de-serialize the XML into entities instead and store them approprietly? 是否可以将XML反序列化为实体并将其存储为approprietly?

I suspect that if there are any performance issues it will be with the stored procedure and the database side of things rather that reading the file. 我怀疑如果有任何性能问题,它将与存储过程和数据库方面相关,而不是读取文件。

Why are you storing the XML file in a database table? 为什么要将XML文件存储在数据库表中? I would suggest using a different solution would be appropriate, but without knowing more details about exactly what it is you are trying to do it is hard to advise. 我建议使用不同的解决方案是合适的,但是如果不了解更多有关您正在尝试做什么的详细信息,则很难建议。

If each first-level element in the xml is a record, ie 如果xml中的每个第一级元素都是记录,即

<rootNode>
    <row>...</row>
    <row>...</row>
    <row>...</row>
</rootNode>

Then you could create an IDataReader implemention that reads the xml (via XmlReader ) and presents each as a record, to be imported using SqlBulkCopy . 然后,您可以创建一个IDataReader实现,该实现读取xml(通过XmlReader )并将每个作为记录呈现,以使用SqlBulkCopy导入。 Pretty much like my old answer here . 非常像我在这里的旧答案。

Advantages: 好处:

  • SqlBulkCopy is the fastest way to get data into a database SqlBulkCopy数据导入数据库最快方法
  • stripping it into records makes appropriate use of a database, allowing indexing and proper typing 将其剥离到记录中可以适当地使用数据库,从而允许索引和正确键入
  • it doesn't rely on a huge BLOB going over the wire in an atomic way (necessary for the xml data type) 它不依赖于以原子方式通过线路的巨大BLOB(对于xml数据类型是必需的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM