简体   繁体   English

微调方法listFiles

[英]Fine-tuning method listFiles

Can anyone help in tuning this method? 谁能帮助您调整此方法? When I log the "files" - it only takes around 5 seconds. 当我登录“文件”时-只需要5秒钟左右。 But takes more than 10 minutes before returning the "fileInfo" 但是需要超过10分钟的时间才能返回“ fileInfo”

// fileSystem is HDFS
// dateNow = java.util.Date
// basePath = new Path("/")
// filePattern = "*.sf"

private Map<String, Long> listFiles(final Date dateNow, final Path basePath, 
    final String filePattern) throws IOException {

    RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);
    _LOG.info("files=" + files);

    // map containing <filename, filesize>
    Map<String, Long> fileInfo = new HashMap<String, Long>();
    String regex = RegexUtil.convertGlobToRegex(filePattern);
    Pattern pattern = Pattern.compile(regex);

    if (files != null) {
        while (files.hasNext()) {
            LocatedFileStatus file = files.next();
            Path filePath = file.getPath();
            // Get only the files with created date = current date
            if (DateUtils.truncate(new Date(file.getModificationTime()), 
                java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
                if (pattern.matcher(filePath.getName()).matches()) {
                    fileInfo.put(file.getPath().getName(), file.getLen());
                }
            }
        }
    }

    _LOG.info("fileInfo =" + fileInfo);
    return fileInfo;
}

You Said 你说

When I log the "files" - it only takes around 5 seconds 当我登录“文件”时-仅需5秒钟

 RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(basePath, true);

Yes . 是的 Because this part of the code only checks the File present at that path (eg.:- no.Of Files,size) Status not looking into the file what and how much data it Contains. 因为这部分代码仅检查该路径中存在的File (例如:-文件号,大小),所以状态不检查文件中包含的内容和数据量。

Now if you look into this part of code 现在,如果您研究这部分代码

 while (files.hasNext()) {
            LocatedFileStatus file = files.next();
            Path filePath = file.getPath();
            // Get only the files with created date = current date
            if (DateUtils.truncate(new Date(file.getModificationTime()), 
                java.util.Calendar.DAY_OF_MONTH).equals(dateNow)) {
                if (pattern.matcher(filePath.getName()).matches()) {
                    fileInfo.put(file.getPath().getName(), file.getLen());
                }
            }
        }

then you analyze that it Iterate throughout the Content of all the files in List. 然后分析列表中所有文件的内容。 So, Definitely It will take more time than the previous one. 因此,肯定比上一个要花费更多时间。 This files may contains a number of files with different size of Content . files可能包含许多Content大小不同的文件。

So, Iterating into each file content will definitely took more time. 因此,迭代访问每个文件内容肯定会花费更多时间。 It also depends upon the size of the files this directory Contains. 它还取决于此目录包含的文件的大小。 The more large your file the more time would took this loop. 文件越大,则该循环花费的时间越多。

Use listStatus with a PathFinder. listStatus与PathFinder一起使用。 This does much of the work on the server-side, and accumulated. 这在服务器端做了很多工作,并且是累加的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM