简体   繁体   中英

Is there a better alternative to File.listFiles() method?

I need to read the absolute path, file name & size of the files in a directory. This is how I currently do it:

File diretory = <dir_path>;
File[] listFiles = directory.listFiles();
for (int i = 0; i < listFiles.length; i++) {
    String fileName = file.getName();
    String filePath = file.getAbsolutePath();
    long fileLen = file.length();
    long filelastModified = file.getLastModified();
    ...
}

My directory can have 1000s of files in it. Since I/O Operations being very expensive, is this the most optimal way to accomplish what I am doing?

With Java 7, java.nio.file.DirectoryStream<Path> offers an alternative with a huge gain in performance.

import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
...
    private static void nioDir( String filePath, int maxFiles )
       throws IOException {
      int i = 1;
      Path dir = FileSystems.getDefault().getPath( filePath );
      DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
      for (Path path : stream) {
        System.out.println( "" + i + ": " + path.getFileName() );
        if (++i > maxFiles) break;
      }
      stream.close();
    }

AFAIK, this is close to as efficient as possible in Java. You might be able to squeeze maybe 2 to 5 percent, but that's typically not the kind of performance improvement that is worthwhile.

The problem is that a typical OS doesn't provide a way to retrieve the metadata for multiple files at a time, or retrieve multiple metadata values at a time.

I expect that the metadata operations ( length() , getLastModified() etcetera) will use the vast majority of the time. But it is worth profiling your application to verify that.

Having said this, your application's I/O is probably not as slow as you think. It is likely that the OS will read and cache the disk blocks containing the metadata. The syscalls that read the file metadata will returning cached information most of the time. (Of course, this is OS specific, and dependent on the type of file system you are using.)

In your case :

File[] listFiles = directory.listFiles();

will create 1000 File objects but these are not expensive I/O operations as new File() doesn't perform IO operations while creating objects as FileInputStream do.
But note that you can all the same avoid creating all Files object in one time and reducing the consumed memory by streaming the walking files.
Files.newDirectoryStream(Path dir) that returns a DirectoryStream<Path> and Files.list(Path dir) that returns a Stream<Path> provide ways to achieve that.
Here's a post pointing out some differences between them.

So you could get the same result with the java.nio API in this way :

Path directory = ...;
Files.newDirectoryStream(directory)
     .forEach(p -> {
         try {
            String fileName = p.getFileName().toString();
            String filePath = p.toAbsolutePath().toString();
            long fileLen =  Files.size(p);
            long filelastModified = Files.getLastModifiedTime(p).toMillis();
        } catch (IOException e) {
            // FIXME to handle
        }

     });

Edit for comment :

What if there are sub-directories & there is a need to retrieve the details of files inside the sub-directories too?

In this case Files.walk() is more suitable as it is recursive.
It is very close to :

Path directory = ...;
Files.walk(directory)
     .forEach(p -> {
         try {
                // same code ....
         } catch (IOException e) {
             // FIXME to handle
         }

     });

I'd use File.list(), not listFiles(), it's a bit closer to the native api, less File objects to create upfront. But that's a small gain.

It's more interesting to pay attention to the fact that File.list() returns only the child name, so you save a few getters, and the path is the same for all children at a given parent, saving more trivial getters again.

You won't save on size and date, those have to be called once for each, sorry.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM