简体   繁体   English

内存中的文件大小与文件系统中的大小相同吗?

[英]Are files in memory the same size as they are in the file system?

I've been working with large log files (~100 MB) in Java and noticed that gzip can compress them to around ~3MB, making them 35x smaller. 我一直在使用Java中的大型日志文件(~100 MB),并注意到gzip可以将它们压缩到大约3MB左右,使它们缩小35倍。

So I wonder: do modern OSes compress files before loading them into memory? 所以我想知道:在将文件加载到内存之前,现代操作系统会压缩文件吗? It seems silly to use 100 MB of RAM to hold a file that really only has 3 MB of information. 使用100 MB的RAM来保存一个真正只有3 MB信息的文件似乎很愚蠢。

Or is it the opposite? 还是相反? Does the process of reading a file (and dealing with encodings and whatnot) mean that a file which takes up 100MB on disk is actually bigger than 100MB in memory? 读取文件(以及处理编码和诸如此类)的过程是否意味着磁盘上占用100MB的文件实际上大于内存中的100MB?

*bonus points: Any recommendations for preprocessing I could do to my files before loading them in order to reduce my JVM 's memory usage? *奖励积分:任何预处理建议我可以在加载之前对我的文件做些什么来减少我的JVM的内存使用量? (The files have the same format as Apache server logs.) (这些文件的格式与Apache服务器日志的格式相同。)

Do modern OSes compress files before loading them into memory? 现代OS在将文件加载到内存之前是否压缩文件? It seems silly to use 100 MB of RAM to hold a file that really only has 3 MB of information. 使用100 MB的RAM来保存一个真正只有3 MB信息的文件似乎很愚蠢。

This would depend on the application involved. 这取决于所涉及的应用。 Some applications may compress data held in memory, others may not. 某些应用程序可能会压缩内存中保存的数据,而其他应用程

Or is it the opposite? 还是相反? Does the process of reading a file (and dealing with encodings and whatnot) mean that a file which takes up 100MB on disk is actually bigger than 100MB in memory? 读取文件(以及处理编码和诸如此类)的过程是否意味着磁盘上占用100MB的文件实际上大于内存中的100MB?

Again, depends entirely on the application. 同样,完全取决于应用程序。

*bonus points: Any recommendations for preprocessing I could do to my files before loading them in order to reduce my JVM's memory usage? *奖励积分:任何预处理建议我可以在加载之前对我的文件做些什么来减少我的JVM的内存使用量? (The files have the same format as Apache server logs.) (这些文件的格式与Apache服务器日志的格式相同。)

Don't load any data into memory that you do not need for processing or display. 不要将任何数据加载到您不需要处理或显示的内存中。 Anything that is simply required for producing an average or sum can be loaded temporarily and added into the running total, and can then be discarded. 产生平均值或总和所需的任何东西都可以临时加载并添加到运行总计中,然后可以丢弃。

You get only what you ask for. 你只得到你要求的东西。 If you compress it, it will be compressed. 如果你压缩它,它将被压缩。 Most of the time there will be a slight difference between the size in memory and the size on the disk. 大多数情况下,内存大小和磁盘大小之间会略有不同。 But that is only due to the fact that unit of storage on the disk (sector) is larger. 但这只是因为磁盘(扇区)上的存储单元更大。 Even for 1 byte file you most of the time use more than that on the disk because the OS reserve a sector for that and it will depend on OSes, you will mostly find sector of 512, 2048 or 4096 bytes. 即使对于1字节文件,您大多数时间使用的文件多于磁盘上的文件,因为操作系统为此保留了一个扇区,它将取决于操作系统,您将主要找到512,2048或4096字节的扇区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM