简体   繁体   中英

Fine-tuning method listFiles

Can anyone help in tuning this method? When I log the "files" - it only takes around 5 seconds. But takes more than 10 minutes before returning the "fileInfo"

// fileSystem is HDFS
// dateNow = java.util.Date
// basePath = new Path("/")
// filePattern = "*.sf"

private Map<String, Long> listFiles(final Date dateNow, final Path basePath, 
    final String filePattern) throws IOException {

    RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);
    _LOG.info("files=" + files);

    // map containing <filename, filesize>
    Map<String, Long> fileInfo = new HashMap<String, Long>();
    String regex = RegexUtil.convertGlobToRegex(filePattern);
    Pattern pattern = Pattern.compile(regex);

    if (files != null) {
        while (files.hasNext()) {
            LocatedFileStatus file = files.next();
            Path filePath = file.getPath();
            // Get only the files with created date = current date
            if (DateUtils.truncate(new Date(file.getModificationTime()), 
                java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
                if (pattern.matcher(filePath.getName()).matches()) {
                    fileInfo.put(file.getPath().getName(), file.getLen());
                }
            }
        }
    }

    _LOG.info("fileInfo =" + fileInfo);
    return fileInfo;
}

You Said

When I log the "files" - it only takes around 5 seconds

 RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);

Yes . Because this part of the code only checks the File present at that path (eg.:- no.Of Files,size) Status not looking into the file what and how much data it Contains.

Now if you look into this part of code

 while (files.hasNext()) {
            LocatedFileStatus file = files.next();
            Path filePath = file.getPath();
            // Get only the files with created date = current date
            if (DateUtils.truncate(new Date(file.getModificationTime()), 
                java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
                if (pattern.matcher(filePath.getName()).matches()) {
                    fileInfo.put(file.getPath().getName(), file.getLen());
                }
            }
        }

then you analyze that it Iterate throughout the Content of all the files in List. So, Definitely It will take more time than the previous one. This files may contains a number of files with different size of Content .

So, Iterating into each file content will definitely took more time. It also depends upon the size of the files this directory Contains. The more large your file the more time would took this loop.

Use listStatus with a PathFinder. This does much of the work on the server-side, and accumulated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM