简体   繁体   English

按字典顺序的Files.walkFileTree

[英]Files.walkFileTree in lexicographical order

I have a unit test that is trying to mock reading an S3 bucket using the local filesystem. 我有一个单元测试,试图使用本地文件系统模拟读取S3存储桶。 To do that, I am utilizing Files.walkFileTree to just add certain records to a list. 为此,我利用Files.walkFileTree将某些记录添加到列表中。

Here's the folder that is being walked, and I am later extracting the data out of the .gz files. 这是要走的文件夹,稍后我.gz文件中提取数据。

$ ls -l /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01/ | cut -d' ' -f8-

  41 Dec 19 18:38 topic-00000-000000000000.gz
 144 Dec 19 18:38 topic-00000-000000000000.index.json
  48 Dec 19 18:38 topic-00001-000000000000.gz
 144 Dec 19 18:38 topic-00001-000000000000.index.json

Here's the mock method 这是模拟方法

final AmazonS3 client = mock(AmazonS3Client.class);
when(client.listObjects(any(ListObjectsRequest.class))).thenAnswer(new Answer<ObjectListing>() {

    private String key(File file) {
        return file.getAbsolutePath().substring(dir.toAbsolutePath().toString().length() + 1);
    }

    @Override
    public ObjectListing answer(InvocationOnMock invocationOnMock) throws Throwable {
        final ListObjectsRequest req = (ListObjectsRequest) invocationOnMock.getArguments()[0];
        final String bucket = req.getBucketName();
        final String marker = req.getMarker();
        final String prefix = req.getPrefix();
        logger.debug("prefix = {}; marker = {}", prefix, marker);

        final List<File> files = new ArrayList<>();
        Path toWalk = dir;
        if (prefix != null) {
            toWalk = Paths.get(dir.toAbsolutePath().toString(), prefix).toAbsolutePath();
        }
        logger.debug("walking\t{}", toWalk);
        Files.walkFileTree(toWalk, new SimpleFileVisitor<Path>() {
            @Override
            public FileVisitResult preVisitDirectory(Path toCheck, BasicFileAttributes attrs) throws IOException {
                if (toCheck.startsWith(dir)) {
                    logger.debug("visiting\t{}", toCheck);
                    return FileVisitResult.CONTINUE;
                }
                logger.debug("skipping\t{}", toCheck);
                return FileVisitResult.SKIP_SUBTREE;
            }

            @Override
            public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
                File f = path.toFile();
                String key = key(f);
                if (marker == null || key.compareTo(marker) > 0) {
                    logger.debug("adding\t{}", f);
                    files.add(f);
                }
                return FileVisitResult.CONTINUE;
            }
        });

        ObjectListing listing = new ObjectListing();
        List<S3ObjectSummary> summaries = new ArrayList<>();
        Integer maxKeys = req.getMaxKeys();
        for (int i = 0; i < maxKeys && i < files.size(); i++) {
            String key = key(files.get(i));

            S3ObjectSummary summary = new S3ObjectSummary();
            summary.setKey(key);
            logger.debug("adding summary for {}", key);
            summaries.add(summary);

            listing.setNextMarker(key);
        }

        listing.setMaxKeys(maxKeys);
        listing.getObjectSummaries().addAll(summaries);
        listing.setTruncated(files.size() > maxKeys);

        return listing;
    }
});

And the log output 和日志输出

2018-12-19 18:38:13.469 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - prefix = prefix; marker = prefix/2016-01-01
2018-12-19 18:38:13.470 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - walking   /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix
2018-12-19 18:38:13.475 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - visiting  /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix
2018-12-19 18:38:13.476 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - visiting  /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01
2018-12-19 18:38:13.477 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding    /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01/topic-00000-000000000000.index.json
2018-12-19 18:38:13.477 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding    /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01/topic-00001-000000000000.index.json
2018-12-19 18:38:13.477 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding    /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01/topic-00001-000000000000.gz
2018-12-19 18:38:13.477 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding    /var/folders/8g/f_n563nx5yv9mdpnznnxv8gj1xs_mm/T/s3FilesReaderTest1892987110875929052/prefix/2016-01-01/topic-00000-000000000000.gz
2018-12-19 18:38:13.479 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding summary for prefix/2016-01-01/topic-00000-000000000000.index.json
2018-12-19 18:38:13.479 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding summary for prefix/2016-01-01/topic-00001-000000000000.index.json
2018-12-19 18:38:13.479 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding summary for prefix/2016-01-01/topic-00001-000000000000.gz
2018-12-19 18:38:13.479 [main] DEBUG c.s.k.connect.s3.S3FilesReaderTest - adding summary for prefix/2016-01-01/topic-00000-000000000000.gz
2018-12-19 18:38:13.481 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - aws ls bucket/prefix after:prefix/2016-01-01 = [prefix/2016-01-01/topic-00000-000000000000.index.json, prefix/2016-01-01/topic-00001-000000000000.index.json, prefix/2016-01-01/topic-00001-000000000000.gz, prefix/2016-01-01/topic-00000-000000000000.gz]
2018-12-19 18:38:13.481 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Skipping non-data chunk prefix/2016-01-01/topic-00000-000000000000.index.json
2018-12-19 18:38:13.481 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Skipping non-data chunk prefix/2016-01-01/topic-00001-000000000000.index.json
2018-12-19 18:38:13.484 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Adding chunk-key prefix/2016-01-01/topic-00001-000000000000.gz
2018-12-19 18:38:13.484 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Adding chunk-key prefix/2016-01-01/topic-00000-000000000000.gz
2018-12-19 18:38:13.485 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Next Chunks: [prefix/2016-01-01/topic-00001-000000000000.gz, prefix/2016-01-01/topic-00000-000000000000.gz]
2018-12-19 18:38:13.485 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Now reading from prefix/2016-01-01/topic-00001-000000000000.gz
2018-12-19 18:38:13.513 [main] DEBUG c.s.k.c.s3.source.S3FilesReader - Now reading from prefix/2016-01-01/topic-00000-000000000000.gz

The files are all getting read correctly (1 value for key0 and 2 for key1), but my unit test is expecting them to be read in ascending order. 正确读取了所有文件(key0为1值,key1为2值),但是我的单元测试期望它们以升序读取。
All files starting with prefix/2016-01-01/topic-00000 should be read before prefix/2016-01-01/topic-00001 , specifically the adding summary lines 应该在prefix/2016-01-01/topic-00001之前读取所有以prefix/2016-01-01/topic-00000开头的文件,特别是adding summary

java.lang.AssertionError: 
Expected :[key0-0=value0-0, key1-0=value1-0, key1-1=value1-1]
Actual   :[key1-0=value1-0, key1-1=value1-1, key0-0=value0-0]

Other than inserting into a sorted collection rather than a regular List, what other options are there to satisfy that condition such that the files are read in the order that is given by a regular ls operation over a single folder? 除了插入排序后的集合而不是常规列表之外,还有什么其他选项可以满足该条件,以便按常规ls操作在单个文件夹上指定的顺序读取文件?

For now, getting around this problem using a TreeSet for each folder and clearing before and after scanning a folder. 现在,使用每个文件夹的TreeSet来解决此问题,并在扫描文件夹之前和之后清除它。

Path toWalk = dir;
if (prefix != null) {  // Prefix is some path after the parent dir. It's an S3 concept
    toWalk = Paths.get(dir.toAbsolutePath().toString(), prefix).toAbsolutePath();
}

// Absolute paths should be sorted lexicograhically for all files
final Set<File> files = new TreeSet<>(Comparator.comparing(File::getAbsolutePath));
Files.walkFileTree(toWalk, new SimpleFileVisitor<Path>() {

    // Absolute paths should be sorted lexicograhically for files in folders
    private Set<File> accumulator = new TreeSet<>(Comparator.comparing(File::getAbsolutePath));

    @Override
    public FileVisitResult preVisitDirectory(Path toCheck, BasicFileAttributes attrs) throws IOException {
        accumulator.clear();  // Start fresh
        if (toCheck.startsWith(dir)) {
            logger.debug("visiting\t{}", toCheck);
            return FileVisitResult.CONTINUE;
        }
        logger.debug("skipping\t{}", toCheck);
        return FileVisitResult.SKIP_SUBTREE;
    }

    @Override
    public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
        File f = path.toFile();
        String key = key(f);
        if (marker == null || key.compareTo(marker) > 0) {
            logger.debug("adding\t{}", f);
            accumulator.add(f);  // accumulate
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException e) throws IOException {
        files.addAll(accumulator);  // dump results (already sorted)
        accumulator.clear();  // start fresh
        return super.postVisitDirectory(dir, e);
    }
});

One option is to use a Stream: 一种选择是使用流:

try (Stream<Path> tree = Files.walk(toWalk)) {
    tree.filter(p -> !Files.isDirectory(p) && p.startsWith(dir)).sorted()
        .forEachOrdered(path -> {
            File f = path.toFile();
            String key = key(f);
            // etc.
        });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM