简体   繁体   English

构建内存高效的Java应用程序有哪些最佳实践?

[英]What are some best practices to build memory-efficient Java applications?

Java programs can be very memory hungry. Java程序可能会非常消耗内存。 For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. 例如,一个Double对象具有24个字节:8个字节的数据和16个字节的JVM施加的开销。 In general, the objects that represent the primitive types are very expensive. 通常,代表基本类型的对象非常昂贵。

The same happens for any collection in the Java Standard Library. Java标准库中的任何集合都会发生同样的情况。 There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap , since a HashSet contains a HashMap inside ( http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html ). 甚至还有一些违反直觉的事实,例如HashSetHashMap占用更多的内存,因为HashSet在内部包含一个HashMaphttp://docs.oracle.com/javase/7/docs/api/java/util/HashSet。 html )。

Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated? 在高性能设置中对数据进行建模和对象委派时,您能否提出一些建议,以减轻Java的这些“弱点”?

Depends on the application, but generally speaking 取决于应用程序,但一般来说

  • Layout data structures in (parallel) arrays of primitives (并行)基元数组中的布局数据结构

  • Try to make big "flat" objects, inlining otherwise sensible sub-structures 尝试制造大型“扁平”物体,并嵌入其他明智的子结构

  • Specialize collections of primitives 专业化图元集合

  • Reuse objects, use object pools, ThreadLocals 重用对象,使用对象池,ThreadLocals

  • Go off-heap 乱堆

I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase. 我不能说这些实践是“最佳”的,因为不幸的是,它们使您受苦,失去了使用Java的意义,降低了代码库的灵活性,可支持性,可靠性,可测试性和其他“良好”属性。

But, they certainly allow to lower memory footprint and GC pressure. 但是,它们肯定可以减少内存占用和GC压力。

One of the memory problems that are easy to overlook in Java is memory leakage. 在Java中容易忽略的内存问题之一是内存泄漏。 Nicholas Greene already pointed you to memory profiling. Nicholas Greene已经为您指出了内存配置文件。

Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. 许多人认为Java的垃圾回收可以防止内存泄漏,但这实际上并非如此-它所需要的只是在某个地方遗忘了一个引用,以使对象永久存在。 Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures. 矛盾的是,尝试优化程序可能会为内存泄漏带来更多机会,因为最终会导致更为复杂的数据结构。

One example for a memory leak if you are implementing, for instance, a stack: 例如,如果要实现堆栈,则会发生内存泄漏的一个示例:

Integer stack[];
stack = new Integer[10];
int stackPtr = 0;

// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);

// and pop from the stack again
--stackPtr;
--stackPtr;

// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.

The correct solution would have been: 正确的解决方案是:

stack[--stackPtr] = null;

Some techniques I use to reduce memory: 我用来减少内存的一些技术:

  • Make your own IntArrayList (etc) class that prevents boxing 制作自己的IntArrayList(etc)类以防止装箱
  • Make your own IntHashMap (etc) class where keys are primitives 制作自己的IntHashMap(etc)类,其中键是基元
  • Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). 使用nio的ByteBuffer可以有效地存储大量数据(并在本机内存中,在堆外部)。 It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed) 它就像一个字节数组,但是包含以任意偏移量存储/检索缓冲区中所有原始类型的方法(以速度换取内存)
  • Don't use pooling because pools keep unused instances explicitly alive. 不要使用池,因为池使未使用的实例显式地保持活动状态。
  • Use threads scarcely, they're super memory hungry (in native memory, outside heap) 几乎不用线程,它们占用了超级内存(在本机内存中,在堆外)
  • When making substrings of big strings, and discarding the original, the substrings still refer to the original. 在制作大字符串的子字符串并丢弃原始字符串时,这些子字符串仍引用原始字符串。 So use new String to dispose of the old big string. 因此,请使用new String处理旧的大字符串。
  • A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array. 线性数组小于多维数组,并且如果除最后一个维之外的所有维的大小都是2的幂,则计算索引最快:16xN数组的array[x|y<<4]
  • Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance. 使用选择的初始容量初始化集合和StringBuilder ,以便在典型情况下防止内部重新分配。
    • Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings. 使用StringBuilder而不是字符串串联,因为已编译的类文件使用new StringBuilder()但没有初始能力来串联字符串。

If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java. 如果您具有较高的性能约束,并且需要将集合用于简单类型,则可以查看Java原始集合的一些实现。

Some are: 一些是:

Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types? 另外,作为参考,请看以下问题: 为什么Java集合不能直接存储原始类型?

Luís Bianchin already gave you a few libraries which implement optimal collections in Java. LuísBianchin已经为您提供了一些库,这些库可以用Java实现最佳集合。 Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. 但是,您似乎特别担心Java集合的内存分配。 In that case, there are a few alternatives which are quite straight forward. 在这种情况下,有一些非常简单的选择。

  1. Cache 快取

You could use a cache to limit the memory the collection (the cache) can allocate. 您可以使用缓存来限制集合(缓存)可以分配的内存。 By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. 这样,您只需将最常用的条目加载到主存储器中,而无需从磁盘/网络/任何内容加载整个数据集。 I highly recommend Guava Cache as it's very well documented and pretty mature. 我强烈建议您使用Guava Cache,因为它有据可查且非常成熟。

  1. Persistent Collections 永久收藏

Sometimes a cache is not a solution for your problem. 有时,缓存不能解决您的问题。 For example, in an ETL solution, you might know you will only load each entry once. 例如,在ETL解决方案中,您可能知道您只会将每个条目加载一次。 For this scenario I recommend to go for persistent collections. 对于这种情况,我建议您进行持久性收集。 These are disk stored collections that are way faster than traditional databases but have nice Java APIs. 这些是磁盘存储的集合,比传统数据库要快得多,但是具有不错的Java API。 MapDB and PCollections are for me the best libraries. 对我来说, MapDBPCollections是最好的库。

  1. Profile memory usage 配置文件内存使用情况

On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. 最重要的是,如果您真的想知道程序的内存分配的实际状态,我强烈建议您使用探查器。 This way you will not only know how much memory you collections occupy, but also how the GC behaves over time. 这样,您不仅会知道集合占用了多少内存,而且还将知道GC随时间的行为。

In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you. 实际上,只有在存在实际内存问题时,才应尝试使用Java集合和数据结构的替代方法,这是探查器可以告诉您的。

The JDK has a profiler called VisualVM which does a great job. JDK有一个称为VisualVM的探查器,它可以很好地完成工作。 Nevertheless, I recommend you to use a commercial profiler if you can afford it. 不过,我建议您在负担得起的情况下使用商用分析器。 The commercial profilers usually have a low impact in the application's performance when compared to VisualVM. 与VisualVM相比,商业分析器通常对应用程序的性能影响很小。

  1. Memory optimal data is nice with the network. 内存最佳数据对于网络来说很好。

Finally, that it's not strictly related to your question, but it's closely connected. 最后,它与您的问题并不严格相关,但紧密相关。 In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java . 如果您想将Java对象序列化为最佳的二进制表示形式,我建议您使用Java中的Google协议缓冲区 Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding. 协议缓冲区非常适合传输认为网络使用尽可能少的带宽并且具有真正快速的编码/解码的数据结构。

Well there is a lot of things you can do. 好吧,您可以做很多事情。

Here are a few problems and solutions: 以下是一些问题和解决方案:

  1. When you change the value of a string in java, the string is not actually overwritten. 当您在Java中更改字符串的值时,该字符串实际上并未被覆盖。 Instead, a new string is created to replace the old one. 相反,将创建一个新字符串来替换旧字符串。 However, the old string still exists. 但是,旧字符串仍然存在。 This can be a problem when using RAM efficiently is a concern. 当需要有效使用RAM时,这可能是一个问题。 Here are some solutions to this problem: 以下是此问题的一些解决方案:

    • When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. 当使用字符串指定对象的“状态”之类的东西或只能具有一组特定的可能值的其他东西时,请勿使用字符串。 Instead use an enum. 而是使用一个枚举。 If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them! 如果您还不知道枚举是什么或如何使用枚举,则这里是指向有关枚举是什么以及如何使用它们的教程的链接!
    • If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. 如果将字符串用作变量,其值会在程序中的某个时刻发生变化,请不要像通常那样定义字符串。 Instead, use the StringBuilder class from the java.lang package. 而是使用java.lang包中的StringBuilder类。 StringBuilder is a class which is used to create strings and change their values. StringBuilder是用于创建字符串和更改其值的类。 This class handles strings differently than usual. 此类处理字符串的方式与通常不同。 When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. 当它用于更改字符串的值时,StringBuilder不会创建具有不同值的重复字符串来替换旧字符串,而是实际上会更改原始字符串的值。 Therefore, since you aren't creating duplicate strings, this saves RAM. 因此,由于您没有创建重复的字符串,因此可以节省RAM。 Here is a link to to the StringBuilder class in the java api. 这是指向Java api中StringBuilder类的链接。
  2. Writer and reader objects such as fileWriters and fileReaders also take up RAM. 写入器和读取器对象(例如fileWriters和fileReaders)也占用RAM。 If you have a lot of them, this can also cause problems. 如果您有很多,这也会引起问题。 Here are some solutions: 以下是一些解决方案:

    • All reader and writer objects have a method called close(). 所有读取器和写入器对象都有一个称为close()的方法。 As you can probably guess, it closes the writer or reader object. 您可能会猜到,它关闭了writer对象或reader对象。 All it does is get rid of the reader or writer object. 它所做的就是摆脱读者或作家对象。 Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. 只要您拥有读取器或写入器对象,并且在知道不再使用读取器或写入器对象的情况下到达代码点,请使用此方法。 It will get rid of the reader or writer object and will free some RAM. 它会摆脱读取器或写入器对象,并释放一些RAM。
  3. Every object in java takes up memory. Java中的每个对象都占用内存。 When you have an object that you won't use anymore, it's not very convenient to keep it around. 当您有一个不再使用的对象时,将其保留在周围不是很方便。

    • The Object class has a method called finalize(). Object类具有一个称为finalize()的方法。 This method has the same effect as the close() method in reader and writer objects. 此方法与reader和writer对象中的close()方法具有相同的效果。 When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM. 当您不再使用对象时,请使用finalize()方法摆脱该对象并释放一些RAM。

Beware of early optimisation. 当心提早优化。 See When is optimisation premature? 请参阅优化何时过早?

While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. 虽然不知道您的应用程序或运行时环境的确切要求,但以我的经验,java能够处理我扔给它的任何东西。 Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue. 如果性能或垃圾收集(标记为内存泄漏)成为问题,那么在演示/概念验证应用程序上进行一些性能分析可能会花费很多时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM