简体   繁体   English

列出目录中与文件掩码(又名模式或 Glob)匹配的所有文件

[英]List all Files from a Directory that match a File Mask (a.k.a Pattern or Glob)

I want to list all files in a directory and subdirectories within that directory that match a file mask .我想列出目录中与文件掩码匹配的目录和子目录中的所有文件

For example "M:\SOURCE\*.doc" while SOURCE may look like this:例如“M:\SOURCE\*.doc”,而 SOURCE 可能如下所示:

|-- SOURCE
|   |-- Folder1
|   |   |-- File1.doc
|   |   |-- File1.txt
|   |-- File2.doc
|   |-- File3.xml

Should return File1.doc and File2.doc.应返回 File1.doc 和 File2.doc。

Initially, I use a DirectoryStream , because that already makes some checks for the mask/glob syntax as well as being able to use it for filtering as this ISN'T just some regex but an actual file mask that a regular user finds easier to understand最初,我使用DirectoryStream ,因为它已经对掩码/glob 语法进行了一些检查,并且能够将其用于过滤,因为这不仅仅是一些正则表达式,而是普通用户更容易理解的实际文件掩码

Files.newDirectoryStream(path, mask);

The problem is a DirectoryStream only checks the immediate path directory that you provide and not it's subdirectories问题是 DirectoryStream 只检查您提供的直接路径目录,而不是它的子目录

THEN comes a "flattening" method with Files.walk which is in fact able to look through all of the subdirectories, problem is, it DOES NOT provide with the possibility to "filter" by a File Mask the same way that a DirectoryStream does然后是Files.walk的“扁平化”方法,实际上能够查看所有子目录,问题是,它不提供与 DirectoryStream 相同的方式通过文件掩码“过滤”的可能性

Files.walk(path, Integer.MAX_VALUE);

So I'm stuck, unable to combine the best of both methods here...所以我被卡住了,无法在这里结合两种方法的优点......

You can use also custom FileVisitor [1], with combination of PathMatcher [2], which works perfectly with GLOBs.您还可以使用自定义FileVisitor [1],结合PathMatcher [2],它与 GLOB 完美配合。

Code might look like this:代码可能如下所示:

public static void main(String[] args) throws IOException {
    System.out.println(getFiles(Paths.get("/tmp/SOURCE"), "*.doc"));
}

public static List<Path> getFiles(final Path directory, final String glob) throws IOException {
    final var docFileVisitor = new GlobFileVisitor(glob);
    Files.walkFileTree(directory, docFileVisitor);

    return docFileVisitor.getMatchedFiles();
}

public static class GlobFileVisitor extends SimpleFileVisitor<Path> {

    private final PathMatcher pathMatcher;
    private List<Path> matchedFiles = new ArrayList<>();

    public GlobFileVisitor(final String glob) {
        this.pathMatcher = FileSystems.getDefault().getPathMatcher("glob:" + glob);
    }

    @Override
    public FileVisitResult visitFile(Path path, BasicFileAttributes basicFileAttributes) throws IOException {
        if (pathMatcher.matches(path.getFileName())) {
            matchedFiles.add(path);
        }
        return FileVisitResult.CONTINUE;
    }

    public List<Path> getMatchedFiles() {
        return matchedFiles;
    }
}

[1] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/FileVisitor.html [1] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/FileVisitor.html

[2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/PathMatcher.html [2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/PathMatcher.html

I think I might have solved my own question with the insight received here and other questions mentioning the PathMatcher object我想我可能已经用这里收到的见解和其他提到PathMatcher object 的问题解决了我自己的问题

final PathMatcher maskMatcher = FileSystems.getDefault()
                  .getPathMatcher("glob:" + mask);

final List<Path> matchedFiles = Files.walk(path)
                  .collect(Collectors.toList());

final List<Path> filesToRemove = new ArrayList<>(matchedFiles.size());

matchedFiles.forEach(foundPath -> {
            if (!maskMatcher.matches(foundPath.getFileName()) || Files.isDirectory(foundPath)) {
              filesToRemove.add(foundPath);
            }
          });

 matchedFiles.removeAll(filesToRemove);

So basically .getPathMatcher("glob:" + mask);所以基本上.getPathMatcher("glob:" + mask); is the same thing that the DirectoryStream was doing to filter the filesDirectoryStream过滤文件的操作相同

All I have to do now after that is filtering the list of paths that I get with Files.walk by removing the elements that do not match my PathMatcher and are not of type File在那之后我现在要做的就是通过删除与我的 PathMatcher 不匹配且不属于 File 类型的元素来过滤我使用Files.walk获得的路径列表

It is possible to use common Stream filter to retrieve the filtered file names from Files.walk using String::matches with appropriate regular expression:可以使用常见的 Stream filterFiles.walk中检索过滤后的文件名,使用String::matches和适当的正则表达式:

final String SOURCE_DIR = "test";

Files.walk(Paths.get(SOURCE_DIR));
     .filter(p -> p.getFileName().toString().matches(".*\\.docx?"))
     .forEach(System.out::println);

Output Output

test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx

Input directory structure:输入目录结构:

│   t1.doc
│   t2.txt
│   t3.docx
│   t4.bin
│
├───level01
│   │   test.do
│   │
│   └───level11
│           test.doc
│
└───level02
        test-level2.doc

Update更新

A recursive solution is possible using newDirectoryStream however it needs to be converted into Stream:使用newDirectoryStream可以使用递归解决方案,但需要将其转换为 Stream:

static Stream<Path> readFilesByMaskRecursively(Path start, String mask) {
        
    List<Stream<Path>> sub = new ArrayList<>();
        
    try {
        sub.add(StreamSupport.stream( // read files by mask in current dir
                Files.newDirectoryStream(start, mask).spliterator(), false));
            
        Files.newDirectoryStream(start, (path) -> path.toFile().isDirectory())
             .forEach(path -> sub.add(recursive(path, mask)));
    } catch (IOException ioex) {
        ioex.printStackTrace();
    }
        
    return sub.stream().flatMap(s -> s); // convert to Stream<Path>
}

// test
readFilesByMaskRecursively(Paths.get(SOURCE_DIR), "*.doc*")
             .forEach(System.out::println);

Output: Output:

test\t1.doc
test\t3.docx
test\level01\level11\test.doc
test\level02\test-level2.doc

Update 2更新 2

A prefix **/ may be added to the PathMatcher to cross directory boundaries, then Files.walk -based solution may use simplified filter without the need to remove specific entries:可以将前缀**/添加到PathMatcher以跨越目录边界,然后基于Files.walk的解决方案可以使用简化的过滤器而不需要删除特定条目:

String mask = "*.doc*";
PathMatcher maskMatcher = FileSystems.getDefault().getPathMatcher("glob:**/" + mask);
Files.walk(Paths.get(SOURCE_DIR))
     .filter(path -> maskMatcher.matches(path))
     .forEach(System.out::println);

Output (same as in the recursive solution): Output(与递归解相同):

test\level01\level11\test.doc
test\level02\test-level2.doc
test\t1.doc
test\t3.docx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM