简体   繁体   English

如何通过XPathDocument估算特定xml文件的内存需求

[英]How to estimate memory need by XPathDocument for a specific xml file

Is there any way to estimate the memory requirement for creating an XpathDocument instance based on the file size of the xml? 有什么方法可以根据xml的文件大小来估计用于创建XpathDocument实例的内存需求吗?

XpathDocument xdoc = new XpathDocument(xmlfile); XpathDocument xdoc = new XpathDocument(xmlfile);

Is there any way to programmatically stop the process of creating the XpathDocument if memory drops to a very low level? 如果内存下降到非常低的水平,是否可以通过任何方式以编程方式停止创建XpathDocument的过程?

Since it loads the entire xml into memory, it would be nice to know ahead of time if the xml is too big. 由于它将整个xml加载到内存中,因此最好提前知道xml是否太大。 What I have found is that when I create a new XpathDocument with a big xml file, an outofmemory exception is never fired, but that the process slows to a crawl, only 5 Mb of memory remains a available and the Task Manager reports it is not responding. 我发现的是,当我用一个大的xml文件创建一个新的XpathDocument时,永远不会触发内存不足的异常,但是该过程会缓慢地进行爬网,只有5 Mb的内存仍然可用,并且任务管理器报告它没有响应。 This happened with a 266 Mb xml file when there was 584 Mb of ram. 当有584 Mb的ram时,这发生在一个266 Mb的xml文件中。 I was able to load a 150 Mb file with no problems in 18. 我能够在18中没有问题地加载150 Mb文件。

After loading the xml, I want to do xpath queries using an XpathNavigator and an XpathNodeIterator. 加载xml后,我想使用XpathNavigator和XpathNodeIterator进行xpath查询。 I am using .net 2.0, xp sp3. 我使用的是.net 2.0,xp sp3。

In short, no you cannot, except if you always have similar files to gather statictical data before starting the estimations. 简而言之,没有,除非您总是有相似的文件可以在开始估算之前收集静态数据,否则不能。

Since tag, attribute, prefix and namespace strings are interned, it pretty much depends on the structure of the XML file how efficient the storage can be, and the ratio compared to the file on disk also depends on the encoding used. 由于标签,属性,前缀和名称空间字符串是固定的,因此它在很大程度上取决于XML文件的结构,存储的效率如何,并且与磁盘上文件的比率也取决于所使用的编码。

In general, .NET stores any string as UTF16 in memory. 通常,.NET将任何字符串作为UTF16存储在内存中。 Therefore, even if there was no significant structural overhead (imagine an XML file with only a single root tag and lots of plain text in it), the memory used would still double for a UTF8 source file (or also ASCII or any other 8-bit encoding) used. 因此,即使没有显着的结构开销(想象一个XML文件中只有一个根标签,并且其中包含很多纯文本),使用的内存对于UTF8源文件(或者ASCII或任何其他8位)仍然是原来的两倍。比特编码)使用。 So string encoding is the first part in the equation. 因此,字符串编码是方程式的第一部分。

The other thing is that a data structure is built in-memory to allow the efficient traversal of the document. 另一件事是,在内存中建立了数据结构以允许有效遍历文档。 Typically, nodes are constructed and linked together with references. 通常,节点被构造并与引用链接在一起。 Therefore each node uses up a certain amount of memory; 因此,每个节点占用一定数量的内存; since most non-value data are references, the memory used here also depends heavily on the architecture (64-bit uses twice as much memory for a single reference than a 32-bit system). 由于大多数非值数据都是引用,因此这里使用的内存也在很大程度上取决于体系结构(64位使用的内存是单个引用的两倍,而不是32位系统)。 So if you have a very complex document with little data (eg a whole bunch of few different tags with little text or attribute values) your memory usage will be much higher than the original document size, and at this will also depend a lot on the architecture your application runs on. 因此,如果您有一个非常复杂的文档,但数据很少(例如,一大堆几个带有不同文本或属性值的不同标签),则您的内存使用量将远高于原始文档的大小,这在很大程度上也将取决于您的应用程序运行的架构。

If you have a file with few very long tag and attribute names and maybe heavy default namespace useage, the memory used may also be much lower than the file on disk. 如果您的文件具有很少的标签和属性名称,并且可能使用严​​重的默认命名空间,则使用的内存也可能远低于磁盘上的文件。

So assuming an arbitrary XML file with an unknown encoding, a reasonable amount of data and complexity it will be very difficult to get a reliable estimation. 因此,假定具有未知编码的任意XML文件,合理数量的数据和复杂性,将很难获得可靠的估计。 However, if your XML files are always similar in the points mentionned, you could create some statistics to get a factor which gets the ratio about right for your specific platform. 但是,如果您的XML文件在提到的点中始终相似,则可以创建一些统计信息以获得一个因子,该因子可以获得适合您特定平台的比率。

However, note that looking at "free memory" in the task manager or talking of a "very low memory level" are very vague quantifications. 但是,请注意,在任务管理器中查看“空闲内存”或谈论“内存水平非常低”时,量化非常模糊。 Virtual memory, caches, background applications and services etc. will influence the effective raw memory availability. 虚拟内存,高速缓存,后台应用程序和服务等将影响有效的原始内存可用性。 The .NET Framework can therefore not reliably guess how much memory it should allow to be used to remain performant for a single process, or even before throwing an OutOfMemoryException safely. 因此,.NET Framework无法可靠地猜测应该允许多少内存用于保持单个进程的性能,甚至在安全地抛出OutOfMemoryException之前。 So if you get one of those exceptions, you are usually way beyond a possible recovery point for your application, and you should not try to catch and handle those exceptions. 因此,如果您获得其中一个例外,那么您通常会超出应用程序的可能恢复点,并且您不应该尝试捕获并处理这些异常。

You can simply check the file size and back out if it exceeds a certain upper bound. 您可以简单地检查文件大小,如果超过某个上限则退回。

var xmlFileInfo = new FileInfo(xmlfile);
var isTooBig = xmlFileInfo.Length > maximumSize

This will not be foolproof, because you cannot guess at what the correct maximum size will be. 这不是万无一失的,因为你无法猜出正确的最大尺寸。

Yes sure you can do it with FileInfo class. 确定你可以使用FileInfo类来完成它。

System.IO.FileInfo foo = new System.IO.FileInfo("<your file path as string>"); 
long Size = foo.Length;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM