简体   繁体   English

如何估计给定的任务是否有足够的内存在 Java 中运行

[英]How to estimate whether a given task would have enough memory to run in Java

I am developing an application that allows users to set the maximum data set size they want me to run their algorithm against我正在开发一个应用程序,允许用户设置他们希望我运行他们的算法的最大数据集大小

It has become apparent that array sizes around 20,000,000 in size causes an 'out of memory' error.很明显,大约 20,000,000 的数组大小会导致“内存不足”错误。 Because I am invoking this via reflection, there is not really a great deal I can do about this.因为我是通过反射来调用它的,所以我对此无能为力。

I was just wondering, is there any way I can check / calculate what the maximum array size could be based on the users heap space settings and therefore validate user entry before running the application?我只是想知道,有什么方法可以检查/计算基于用户堆空间设置的最大数组大小,从而在运行应用程序之前验证用户输入?

If not, are there any better solutions?如果没有,有没有更好的解决方案?

Use Case:用例:

  • The user provides a data size they want to run their algorithm against, we generate a scale of numbers to test it against up to the limit they provided.用户提供他们想要运行他们的算法的数据大小,我们生成一个数字比例来测试它,直到他们提供的限制。

  • We record the time it takes to run and measure the values (in order to work out the o-notation).我们记录运行和测量值所需的时间(以计算出 o 符号)。

  • We need to somehow limit the users input so as to not exceed or get this error.我们需要以某种方式限制用户输入,以免超出或出现此错误。 Ideally we want to measure n^2 algorithms on as bigger array sizes as we can (which could last in terms of runtime for days) therefore we really don't want it running for 2 days and then failing as it would have been a waste of time.理想情况下,我们希望在尽可能大的数组大小上测量 n^2 算法(这可能会持续数天的运行时间),因此我们真的不希望它运行 2 天然后失败,因为这会是一种浪费时间。

You can use the result of Runtime.freeMemory() to estimate the amount of available memory.您可以使用Runtime.freeMemory()的结果来估计可用内存量。 However, it might be that actually a lot of memory is occupied by unreachable objects, which will be reclaimed by GC soon.但是,可能实际上很多内存被不可达的对象占用,很快就会被 GC 回收。 So you might actually be able to use more memory than this.因此,您实际上可以使用比这更多的内存。 You can try invoking the GC before, but this is not guaranteed to do anything.您可以尝试在之前调用 GC,但这并不能保证做任何事情。

The second difficulty is to estimate the amount of memory needed for a number given by the user.第二个困难是估计用户给定的数字所需的内存量。 While it is easy to calculate the size of an ArrayList with so many entries, this might not be all.虽然计算具有这么多条目的 ArrayList 的大小很容易,但这可能还不是全部。 For example, which objects are stored in this list?例如,这个列表中存储了哪些对象? I would expect that there is at least one object per entry, so you need to add this memory too.我希望每个条目至少有一个对象,因此您也需要添加此内存。 Calculating the size of an arbitrary Java object is much more difficult (and in practice only possible if you know the data structures and algorithms behind the objects).计算任意 Java 对象的大小要困难得多(实际上只有在了解对象背后的数据结构和算法时才有可能)。 And then there might be a lot of temporary objects creating during the run of the algorithm (for example boxed primitives, iterators, StringBuilders etc.).然后在算法运行期间可能会创建很多临时对象(例如盒装原语、迭代器、StringBuilders 等)。

Third, even if the available memory is theoretically sufficient for running a given task, it might be practically insufficient.第三,即使可用内存理论上足以运行给定任务,但实际上可能不够。 Java programs can get very slow if the heap is repeatedly filled with objects, then some are freed, some new ones are created and so on, due to a large amount of Garbage Collection.如果堆中反复填充对象,然后释放一些对象,创建一些新对象等等,由于大量垃圾收集,Java 程序会变得非常缓慢。

So in practice, what you want to achieve is very difficult and probably next to impossible.因此,在实践中,您想要实现的目标非常困难,而且可能几乎是不可能的。 I suggest just try running the algorithm and catch the OutOfMemoryError.我建议尝试运行算法并捕获 OutOfMemoryError。

Usually, catching errors is something you should not do, but this seems like an occasion where its ok (I do this in some similar cases).通常,捕捉错误是你不应该做的事情,但这似乎是一个可以的场合(我在一些类似的情况下这样做)。 You should make sure that as soon as the OutOfMemoryError is thrown, some memory becomes reclaimable for GC.您应该确保一旦抛出 OutOfMemoryError,一些内存就可以为 GC 回收。 This is usually not a problem, as the algorithm aborts, the call stack is unwound and some (hopefully a lot of) objects are not reachable anymore.这通常不是问题,因为算法中止,调用堆栈被解开并且一些(希望很多)对象不再可访问。 In your case, you should probably ensure that the large list is part of these objects which immediately become unreachable in the case of an OOM.在您的情况下,您可能应该确保大列表是这些对象的一部分,在 OOM 的情况下这些对象立即变得无法访问。 Then you have a good chance of being able to continue your application after the error.然后,您很有可能在出现错误后继续您的申请。

However, note that this is not a guarantee.但是,请注意,这不是保证。 For example, if you have multiple threads working and consuming memory in parallel, the other threads might as well receive an OutOfMemoryError and not be able to cope with this.例如,如果您有多个线程并行工作并消耗内存,则其他线程也可能会收到 OutOfMemoryError 并且无法处理这种情况。 Also the algorithm needs to support the fact that it might get interrupted at any arbitrary point.此外,算法需要支持它可能在任意点被中断的事实。 So it should make sure that the necessary cleanup actions are executed nevertheless (and of course you are in trouble if those need a lot of memory!).因此,它应该确保执行必要的清理操作(当然,如果这些操作需要大量内存,您就会遇到麻烦!)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM