简体   繁体   English

Java中的文件大小与内存大小的比较

[英]File size vs. in memory size in Java

If I take an XML file that is around 2kB on disk and load the contents as a String into memory in Java and then measure the object size it's around 33kB . 如果我在磁盘上获取大约2kB的XML文件,并将内容作为String加载到Java内存中,然后测量对象大小,它大约为33kB

Why the huge increase in size? 为什么规模会大幅增加?
If I do the same thing in C++ the resulting string object in memory is much closer to the 2kB. 如果我在C ++中做同样的事情,内存中生成的字符串对象更接近2kB。

To measure the memory in Java I'm using Instrumentation . 要用Java测量内存,我正在使用Instrumentation For C++, I take the length of the serialized object (eg string). 对于C ++,我采用序列化对象的长度(例如字符串)。

I think there are multiple factors involved. 我认为涉及多个因素。 First of all, as Bruce Martin said, objects in java have an overhead of 16 bytes per object, c++ does not. 首先,正如Bruce Martin所说,java中的对象每个对象的开销为16字节,c ++则没有。 Second, Strings in Java might be 2 Bytes per character instead of 1. Third, it could be that Java reserves more Memory for its Strings than the C++ std::string does. 其次,Java中的字符串可能是每个字符2个字节而不是1.第三,可能是Java为其字符串保留了比C ++ std :: string更多的内存。

Please note that these are just ideas where the big difference might come from. 请注意,这些只是可能产生巨大差异的想法。

Assuming that your XML file contains mainly ASCII characters and uses an encoding that represents them as single bytes, then you can espect the in memory size to be at least double, since Java uses UTF-16 internally (I've heard of some JVMs that try to optimize this, thouhg). 假设您的XML文件主要包含ASCII字符并使用表示它们作为单个字节的编码,那么您可以认为内存大小至少是两倍,因为Java在内部使用UTF-16(我听说过一些JVM,尝试优化这一点,thouhg)。 Added to that will be overhead for 2 objects (the String instance and an internal char array) with some fields, IIRC about 40 bytes overall. 除此之外,还有2个对象(String实例和内部char数组)的开销,其中包含一些字段,IIRC总共大约40个字节。

So your "object size" of 33kb is definitely not correct, unless you're using a weird JVM. 所以33kb的“对象大小”绝对不正确,除非你使用了一个奇怪的JVM。 There must be some problem with the method you use to measure it. 用于测量它的方法必定存在一些问题。

In Java String object have some extra data, that increases it's size. 在Java String对象中有一些额外的数据,这会增加它的大小。
It is object data, array data and some other variables. 它是对象数据,数组数据和一些其他变量。 This can be array reference, offset, length etc. 这可以是数组引用,偏移,长度等。

Visit http://www.javamex.com/tutorials/memory/string_memory_usage.shtml for details. 有关详细信息,请访问http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

String: a String's memory growth tracks its internal char array's growth. String:String的内存增长跟踪其内部char数组的增长。 However, the String class adds another 24 bytes of overhead. 但是, String类增加了另外24个字节的开销。 For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload ( 2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent. 对于大小不超过10个字符的非空字符串,相对于有用负载的额外开销成本( 每个字符2个字节加上长度为4个字节),范围从100到400%。

More: What is the memory consumption of an object in Java? 更多: Java中对象的内存消耗是多少?

Yes, you should GC and give it time to finish. 是的,你应该GC并给它时间来完成。 Just System.gc(); Just System.gc(); and print totalMem() in the loop. 并在循环中打印totalMem()。 You also better to create a million of string copies in array (measure empty array size and, then, filled with strings), to be sure that you measure the size of strings and not other service objects, which may present in your program. 你最好在数组中创建数百万个字符串副本(测量空数组大小,然后填充字符串),以确保测量字符串的大小而不是程序中可能存在的其他服务对象。 String alone cannot take 32 kb. 单独的字符串不能占用32 kb。 But hierarcy of XML objects can. 但XML对象的层次结构可以。

Said that, I cannot resist the irony that nobody cares about memory (and cache hits) in the world of Java. 说,我无法抗拒在Java世界中没有人关心内存(和缓存命中)的讽刺。 We are know that JIT is improving and it can outperform the native C++ code in some cases. 我们知道JIT正在改进,在某些情况下它可以胜过本机C ++代码。 So, there is not need to bother about memory optimization. 因此,不需要为内存优化而烦恼。 Preliminary optimization is a root of all evils. 初步优化是所有邪恶的根源。

As stated in other answers, Java's String is adding an overhead. 正如其他答案中所述,Java的String增加了开销。 If you need to store a large number of strings in memory, I suggest you to store them as byte[] instead. 如果您需要在内存中存储大量字符串,我建议您将它们存储为byte []。 Doing so the size in memory should be the same than the size on disk. 这样做内存中的大小应该与磁盘上的大小相同。

String -> byte[] : String - > byte []:

String a = "hello";
byte[] aBytes = a.getBytes();

byte[] -> String : byte [] - > String:

String b = new String(aBytes);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM