如何从文件系统中获取文件属性流？

Question

我正在编写一个 Web 服务器，并试图确保我尽可能高效，最大限度地减少文件系统调用。 The problem is that the methods that return Streams such as java.nio.file.Files.list return a Stream of Paths , and I would like to have a Stream of BasicFileAttributes , so that I can return the creation time and update time for each路径（例如返回LDP Container的结果）。

当然，一个简单的解决方案是map的每个元素与 function 采用路径并返回文件属性(p: Path) => Files.getAttributeView...每个路径的 FS，这似乎是一种浪费，因为要获取文件信息，JDK 离属性信息不远。

实际上，我从2009 年 OpenJDK 邮件列表中看到了这封邮件，其中指出他们已经讨论过添加一个 API ，它将返回一对路径和属性......

我在 JDK java.nio.file.FileTreeWalker上找到了一个非公共 class ，它有一个 api 允许获取FileTreeWalker.Event的属性。 这实际上利用了sun.nio.fs.BasicFileAttributesHolder ，它允许 Path 保留属性的缓存。 但它不是公开的，也不清楚它在哪里工作。

当然还有整个FileVisitor API，它有返回Path和BasicFileAttributes的方法，如下所示：

public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {...}

因此，我正在寻找是否有办法将其变成 Stream ，它尊重由Akka推动的反应式宣言的背压原则，而不会占用太多资源。 我检查了开源Alpakka File项目，但这也是流式传输返回Path s 的Files方法...

Answer 1

如果有某种方法可以直接获取这些属性，我不知道。

在将 FileVisitor API 转换为反应流时：

反应流背压的机制是拉-推 model，其中需求首先由下游（拉动部分）发出信号，然后上游被允许发送的项目不超过发出信号的需求（推送部分）。

FileVisitor API 的问题在于无法直接连接这种控制流机制。 一旦您将其关闭，它只会调用您的回调，而不会过多关心其他任何事情。

没有一种干净的方法可以解决这个问题，但是您可以使用Source.queue ( https://doc.akka.io/docs/akka/current/stream/operators/Source/queue.ZFC335EZ883A ) 来实现这一点。 API 来自您的 stream 的 rest ，如下所示：

val queue = 
Source.queue[BasiFileAttributes](bufferSize, OverflowStrategy.backpressure)
      //the rest of your Akka Streams pipeline
      .run(system);

这将实现您现在可以传递给FileVisitor的队列。 您向该队列offer的任何内容都将 go 向下传递到 stream。 如果您offer时没有需求并且队列已满，则offer返回的Future将不会完成，直到队列中有空间为止。 因此，在 API 中，您可以简单地执行以下操作：

//inside the FileVisitor API callback
Await.result(queue.offer(attrs), Duration.Inf)

当 stream 被背压时，这将阻塞回调线程。 丑陋但孤立。

Answer 2

您可以使用接受Files.find <Path, BasicFileAttributes> 的 Files.find 访问文件属性及其路径，并在测试每个路径时存储该值。

BiPredicate 中的副作用操作将启用对两个对象的操作，而无需触及路径中每个项目的文件系统。 使用您的谓词条件yourPred ，下面的副作用predicate将收集属性供您在 stream 处理中检索：

public static void main(String[] args) throws IOException {
    Path dir = Path.of(args[0]);

    // Use `ConcurrentHashMap` if using `stream.parallel()`
    HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();

    BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> true;

    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> {
        return yourPred.test(p, a)
                // && p.getNameCount() == dir.getNameCount()+1 // Simulates Files.list
                && attrs.put(p, a) == null;
    };
    try(var stream = Files.find(dir, Integer.MAX_VALUE, predicate)) {
        stream.forEach(p-> System.out.println(p.toString()+" => "+attrs.get(p)));
        // Or: if your put all your handling code in the predicate use stream.count();
    }
}

要模拟File.list的效果，请使用一级find扫描器：

 BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> p.getNameCount() == dir.getNameCount()+1;

对于大型文件夹扫描，您应该像 go 一样通过插入attrs.remove(p); 消耗路径后。

编辑

上面的答案可以重构为返回Map.Entry<Path, BasicFileAttributes>的 stream 的 3 行调用，或者很容易添加一个类/记录来保存 Path/BasicFileAttribute 对并返回Stream<PathInfo> ：

/**
 * Call Files.find() returning a stream with both Path+BasicFileAttributes
 * as type Map.Entry<Path, BasicFileAttributes>
 * <p>Could declare a specific record to replace Map.Entry as:
 *    record PathInfo(Path path, BasicFileAttributes attr) { };
 */
public static Stream<Map.Entry<Path, BasicFileAttributes>>
find(Path dir, int maxDepth, BiPredicate<Path, BasicFileAttributes> matcher, FileVisitOption... options) throws IOException {

    HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();
    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> (matcher == null || matcher.test(p, a)) && attrs.put(p, a) == null;

    return Files.find(dir, maxDepth, predicate, options).map(p -> Map.entry(p, attrs.remove(p)));
}

Answer 3

从 DuncG 的回答开始，我得到以下在 Scala3 中作为非常通用的 Akka Stream class 工作。 这实际上非常简洁，因为它创建了Files.find的副作用，它立即将其封装回一个干净的功能性反应式 stream。

class DirectoryList(
    dir: Path, 
    matcher: (Path, BasicFileAttributes) => Boolean = (p,a) => true, 
    maxDepth: Int = 1
) extends GraphStage[SourceShape[(Path,BasicFileAttributes)]]:
    import scala.jdk.FunctionConverters.*
    import scala.jdk.OptionConverters.*
    
    val out: Outlet[(Path,BasicFileAttributes)] = Outlet("PathAttributeSource")
    override val shape = SourceShape(out)


    override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
        new GraphStageLogic(shape) {
            private var next: (Path,BasicFileAttributes) = _

            def append(path: Path, att: BasicFileAttributes): Boolean = 
                val matched = matcher(path,att)
                if matched then next = (path,att)
                matched
            
            private val pathStream = Files.find(dir, maxDepth, append.asJava)
            private val sit = pathStream.iterator()
            
            setHandler(out, new OutHandler {
                override def onPull(): Unit = { 
                    if sit.hasNext then
                        sit.next()
                        push(out,next)
                    else
                        pathStream.close()  
                        complete(out)
                }

                override def onDownstreamFinish(cause: Throwable): Unit =
                    pathStream.close()  
                    super.onDownstreamFinish(cause)
            })
        }
end DirectoryList

然后可以按如下方式使用：

val sourceGraph = DirectoryList(Path.of("."),depth=10)
val result = Source.fromGraph(sourceGraph).map{ (p: Path,att: BasicFileAttributes) => 
        println(s"received <$p> : dir=${att.isDirectory}")}.run()

完整的源代码在 github和初始测试在这里。 也许可以通过调整答案来改进它，以便批量传递一定数量的路径属性对。

如何从文件系统中获取文件属性流？

问题描述

3 个解决方案

解决方案1
2 2021-03-18 22:59:47

解决方案2
2 已采纳 2021-03-19 08:33:45

解决方案3
1 2021-03-19 18:24:10

如何从文件系统中获取文件属性流？

问题描述

3 个解决方案

解决方案1 2 2021-03-18 22:59:47

解决方案2 2 已采纳 2021-03-19 08:33:45

解决方案3 1 2021-03-19 18:24:10

解决方案1
2 2021-03-18 22:59:47

解决方案2
2 已采纳 2021-03-19 08:33:45

解决方案3
1 2021-03-19 18:24:10