简体   繁体   English

如何在不创建 object 的情况下找出 java 中文件和目录的大小?

[英]how to find out the size of file and directory in java without creating the object?

First please dont overlook because you might think it as common question, this is not.首先请不要忽视,因为您可能认为这是常见问题,但事实并非如此。 I know how to find out size of file and directory using file.length and Apache FileUtils.sizeOfDirectory .我知道如何使用file.lengthApache FileUtils.sizeOfDirectory找出文件和目录的大小。

My problem is, in my case files and directory size is too big (in hundreds of mb).我的问题是,在我的情况下,文件和目录太大(数百 mb)。 When I try to find out size using above code (eg creating file object) then my program becomes so much resource hungry and slows down the performance.当我尝试使用上面的代码(例如创建文件对象)找出大小时,我的程序变得非常需要资源并降低性能。

Is there any way to know the size of file without creating object?有什么方法可以在不创建 object 的情况下知道文件的大小?

I am using for files File file1 = new file(fileName);我正在使用文件 File file1 = new file(fileName); long size = file1.length();长尺寸 = file1.length();

and for directory, File dir1 = new file (dirPath);对于目录,File dir1 = new file (dirPath); long size = fileUtils.sizeOfDirectiry(dir1);长尺寸 = fileUtils.sizeOfDirectiry(dir1);

I have one parameter which enables size computing.我有一个参数可以进行尺寸计算。 If parameter is false then it goes smoothly.如果参数为假,那么它会顺利进行。 If false then program lags or hangs.. I am calculating size of 4 directory and 2 database files.如果为假,则程序滞后或挂起。我正在计算 4 个目录和 2 个数据库文件的大小。

File objects are very lightweight.文件对象非常轻量级。 Either there is something wrong with your code, or the problem is not with the file objects but with the HD access necessary for getting the file size.您的代码有问题,或者问题不在于文件对象,而在于获取文件大小所需的 HD 访问权限。 If you do that for a large number of files (say, tens of thousands), then the harddisk will do a lot of seeks, which is pretty much the slowest operation possible on a modern PC (by several orders of magnitude).如果您对大量文件(例如,数万个)执行此操作,那么硬盘将执行大量搜索,这几乎是现代 PC 上可能最慢的操作(几个数量级)。

A File is just a wrapper for the file path. File 只是文件路径的包装器。 It doesn't matter how big the file is only its file name.文件有多大并不重要,只是它的文件名。

When you want to get the size of all the files in a directory, the OS needs to read the directory and then lookup each file to get its size.当您想获取目录中所有文件的大小时,操作系统需要读取目录,然后查找每个文件以获取其大小。 Each access takes about 10 ms (because that's a typical seek time for a hard drive) So if you have 100,000 file it will take you about 17 minutes to get all their sizes.每次访问大约需要 10 毫秒(因为这是硬盘驱动器的典型寻道时间)因此,如果您有 100,000 个文件,则需要大约 17 分钟才能获得所有大小。

The only way to speed this up is to get a faster drive.加快速度的唯一方法是获得更快的驱动器。 eg Solid State Drives have an average seek time of 0.1 ms but it would still take 10 second or more to get the size of 100K files.例如 Solid State 驱动器的平均寻道时间为 0.1 毫秒,但仍需要 10 秒或更长时间才能获得 100K 文件的大小。

BTW: The size of each file doesn't matter because it doesn't actually read the file.顺便说一句:每个文件的大小并不重要,因为它实际上并没有读取文件。 Only the file entry which has it s size.只有具有其大小的文件条目。


EDIT: For example, if I try to get the sizes of a large directory.编辑:例如,如果我尝试获取大目录的大小。 It is slow at first but much faster once the data is cached.一开始它很慢,但一旦缓存了数据,它就会快得多。

$ time du -s /usr
2911000 /usr

real    0m33.532s
user    0m0.880s
sys 0m5.190s

$ time du -s /usr
2911000 /usr

real    0m1.181s
user    0m0.300s
sys 0m0.840s

$ find /usr | wc -l
259934

The reason the look up is so fast the fist time is that the files were all installed at once and most of the information is available continuously on disk.第一次查找速度如此之快的原因是所有文件都立即安装并且大部分信息在磁盘上连续可用。 Once the information is in memory, it takes next to no time to read the file information.一旦信息在 memory 中,读取文件信息几乎不需要时间。

Timing FileUtils.sizeOfDirectory("/usr") take under 8.7 seconds.计时 FileUtils.sizeOfDirectory("/usr") 需要不到 8.7 秒。 This is relatively slow compared with the time it takes du, but it is processing around 30K files per second.与 du 所花费的时间相比,这相对较慢,但它每秒处理大约 30K 个文件。

An alterative might be to run Runtime.exec("du -s "+directory);一种替代方法可能是运行Runtime.exec("du -s "+directory); however, this will only make a few seconds difference at most.但是,这最多只会产生几秒钟的差异。 Most of the time is likely to be spent waiting for the disk if its not in cache.如果磁盘不在缓存中,大部分时间可能会花在等待磁盘上。

We had a similar performance problem with File.listFiles() on directories with large number of files.在包含大量文件的目录中,我们使用 File.listFiles() 遇到了类似的性能问题。

Our setup was one folder with 10 subfolders each with 10,000 files.我们的设置是一个包含 10 个子文件夹的文件夹,每个子文件夹包含 10,000 个文件。 The folder was on a network share and not on the machine running the test.该文件夹在网络共享上,而不是在运行测试的机器上。

We were using a FileFilter to only accept files with known extensions or a directory so we could recourse down the directories.我们使用 FileFilter 只接受具有已知扩展名或目录的文件,因此我们可以追索目录。

Profiling revealed that about 70% of the time was spent calling File.isDirectory (which I assume Apache is calling).分析显示,大约 70% 的时间用于调用 File.isDirectory(我假设 Apache 正在调用)。 There were two calls to isDirectory for each file (one in the filter and one in the file processing stage).每个文件有两次对 isDirectory 的调用(一次在过滤器中,一次在文件处理阶段)。

File.isDirectory was slow cause it had to hit the network share for each file. File.isDirectory 很慢,因为它必须为每个文件访问网络共享。

Reversing the order of the check in the filter to check for valid name before valid directory saved a lot of time, but we still needed to call isDirectory for the recursive lookup.反转过滤器中的检查顺序以在有效目录之前检查有效名称节省了大量时间,但我们仍然需要调用 isDirectory 进行递归查找。

My solution was to implement a version of listFiles in native code, that would return a data structure that contained all the metadata about a file instead of just the filename like File does.我的解决方案是在本机代码中实现一个 listFiles 版本,这将返回一个包含有关文件的所有元数据的数据结构,而不仅仅是像 File 那样的文件名。

This got rid of the performance problem but added a maintenance problem of having to native code maintained by Java developers (lucking we only supported one OS).这摆脱了性能问题,但增加了必须由 Java 开发人员维护的本机代码的维护问题(幸运的是我们只支持一个操作系统)。

I think that you need to read the Meta-Data of a file.我认为您需要阅读文件的元数据。 Read this tutorial for more information.阅读本教程以获取更多信息。 This might be the solution you are looking for: http://download.oracle.com/javase/tutorial/essential/io/fileAttr.html这可能是您正在寻找的解决方案: http://download.oracle.com/javase/tutorial/essential/io/fileAttr.html

Answering my own question..回答我自己的问题..

This is not the best solution but works in my case..这不是最好的解决方案,但适用于我的情况..

I have created a batch script to get the size of the directory and then read it in java program.我创建了一个批处理脚本来获取目录的大小,然后在 java 程序中读取它。 It gives me less execution time when number of files in directory are more then 1L (That is always in my case).. sizeOfDirectory takes around 30255 ms and with batch script i get 1700 ms.. For less number of files batch script is costly.当目录中的文件数量超过 1L 时,它给了我更少的执行时间(在我的情况下总是如此).. sizeOfDirectory 大约需要 30255 毫秒,而使用批处理脚本我得到 1700 毫秒.. 对于更少的文件,批处理脚本的成本很高.

I'll add to what Peter Lawrey answered and add that when a directory has a lot of files inside it (directly, not in sub dirs) - the time it takes for file.listFiles() it extremely slow (I don't have exact numbers, I know it from experience).我将添加 Peter Lawrey 的回答并补充说,当一个目录中有很多文件时(直接,而不是在子目录中) - file.listFiles()花费的时间非常慢(我没有确切的数字,我从经验中知道)。 The amount of files has to be large, several thousands if I remember correctly - if this is your case, what fileUtils will do is actually try to load all of their names at once into memory - which can be consuming.如果我没记错的话,文件的数量必须很大,几千个 - 如果这是你的情况, fileUtils会做的实际上是尝试将它们的所有名称一次加载到 memory 中 - 这可能会很消耗。

If that is your situation - I would suggest restructuring the directory to have some sort of hierarchy that will ensure a small number of files in each sub-directory.如果这是您的情况 - 我建议重组目录以具有某种层次结构,以确保每个子目录中的少量文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM