如何从文件系统中获取文件属性流？

Question

I am writing a Web Server and am trying to make sure I am as efficient as possible, minimizing File System Calls.我正在编写一个 Web 服务器，并试图确保我尽可能高效，最大限度地减少文件系统调用。 The problem is that the methods that return Streams such as java.nio.file.Files.list return a Stream of Paths , and I would like to have a Stream of BasicFileAttributes , so that I can return the creation time and update time for each Path (on say returning results for an LDP Container ). The problem is that the methods that return Streams such as java.nio.file.Files.list return a Stream of Paths , and I would like to have a Stream of BasicFileAttributes , so that I can return the creation time and update time for each路径（例如返回LDP Container的结果）。

Of course a simple solution would be to map each element of the Stream with a function that takes the path and returns a file attribute (p: Path) => Files.getAttributeView... but that sounds like it would make a call to the FS for each Path, which seems like a waste, because to get the file information the JDK can't have been far from the Attribute info.当然，一个简单的解决方案是map的每个元素与 function 采用路径并返回文件属性(p: Path) => Files.getAttributeView...每个路径的 FS，这似乎是一种浪费，因为要获取文件信息，JDK 离属性信息不远。

I actually came across this mail from 2009 OpenJDK mailing list that states that they had discussed adding an API that would return a pair of a Path and Attributes...实际上，我从2009 年 OpenJDK 邮件列表中看到了这封邮件，其中指出他们已经讨论过添加一个 API ，它将返回一对路径和属性......

I found a non-public class on the JDK java.nio.file.FileTreeWalker which has an api that would allow one to fetch the attributes FileTreeWalker.Event .我在 JDK java.nio.file.FileTreeWalker上找到了一个非公共 class ，它有一个 api 允许获取FileTreeWalker.Event的属性。 That actually makes use of a sun.nio.fs.BasicFileAttributesHolder which allows a Path to keep a cache of the Attributes.这实际上利用了sun.nio.fs.BasicFileAttributesHolder ，它允许 Path 保留属性的缓存。 But it's not public and it is not clear where it works.但它不是公开的，也不清楚它在哪里工作。

There is of course also the whole FileVisitor API, and that has methods that return both a Path and BasicFileAttributes as shown here:当然还有整个FileVisitor API，它有返回Path和BasicFileAttributes的方法，如下所示：

public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {...}

So I am looking if there is a way to turn that into a Stream which respects the principle of back pressure from the Reactive Manifesto that was pushed by Akka , without it hogging too many resources.因此，我正在寻找是否有办法将其变成 Stream ，它尊重由Akka推动的反应式宣言的背压原则，而不会占用太多资源。 I checked the open source Alpakka File project, but that is also streaming the Files methods that return Path s...我检查了开源Alpakka File项目，但这也是流式传输返回Path s 的Files方法...

Answer 1

If there's some way of directly getting those attributes, I don't know.如果有某种方法可以直接获取这些属性，我不知道。

On converting the FileVisitor API to reactive streams:在将 FileVisitor API 转换为反应流时：

The mechanism of reactive streams backpressure is a pull-push model, where a demand is first signaled by downstream (the pull part) and then the upstream is allowed to send no more items than the demand signaled (the push part).反应流背压的机制是拉-推 model，其中需求首先由下游（拉动部分）发出信号，然后上游被允许发送的项目不超过发出信号的需求（推送部分）。

The problem with the FileVisitor API is that there's no way to directly hookup with such control flow mechanism. FileVisitor API 的问题在于无法直接连接这种控制流机制。 Once you set it off it just goes calling your callback and not caring too much about anything else.一旦您将其关闭，它只会调用您的回调，而不会过多关心其他任何事情。

There's no clean way of bridging this, but one way you could do that is using the Source.queue ( https://doc.akka.io/docs/akka/current/stream/operators/Source/queue.html ) to isolate that API from the rest of your stream like that:没有一种干净的方法可以解决这个问题，但是您可以使用Source.queue ( https://doc.akka.io/docs/akka/current/stream/operators/Source/queue.ZFC335EZ883A ) 来实现这一点。 API 来自您的 stream 的 rest ，如下所示：

val queue = 
Source.queue[BasiFileAttributes](bufferSize, OverflowStrategy.backpressure)
      //the rest of your Akka Streams pipeline
      .run(system);

This will materialize to queue that you can now pass to your FileVisitor .这将实现您现在可以传递给FileVisitor的队列。 Anything you offer to that queue will go down the stream.您向该队列offer的任何内容都将 go 向下传递到 stream。 If when you offer there's no demand and the queue is full the Future returned by offer will not complete until such time that there's space in the queue.如果您offer时没有需求并且队列已满，则offer返回的Future将不会完成，直到队列中有空间为止。 So in the API you could simply do:因此，在 API 中，您可以简单地执行以下操作：

//inside the FileVisitor API callback
Await.result(queue.offer(attrs), Duration.Inf)

And that would block the callback thread when the stream is backpressured.当 stream 被背压时，这将阻塞回调线程。 Ugly but isolated.丑陋但孤立。

Answer 2

You can access file attributes with their path by using Files.find which accepts a BiPredicate<Path, BasicFileAttributes> and store the value as it tests each path.您可以使用接受Files.find <Path, BasicFileAttributes> 的 Files.find 访问文件属性及其路径，并在测试每个路径时存储该值。

The side effect action inside the BiPredicate will enable operations on both objects without needing to touch the file system per item in the path. BiPredicate 中的副作用操作将启用对两个对象的操作，而无需触及路径中每个项目的文件系统。 With your predicate condition yourPred , side effect predicate below will collect the attributes for you to retrieve inside the stream processing:使用您的谓词条件yourPred ，下面的副作用predicate将收集属性供您在 stream 处理中检索：

public static void main(String[] args) throws IOException {
    Path dir = Path.of(args[0]);

    // Use `ConcurrentHashMap` if using `stream.parallel()`
    HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();

    BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> true;

    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> {
        return yourPred.test(p, a)
                // && p.getNameCount() == dir.getNameCount()+1 // Simulates Files.list
                && attrs.put(p, a) == null;
    };
    try(var stream = Files.find(dir, Integer.MAX_VALUE, predicate)) {
        stream.forEach(p-> System.out.println(p.toString()+" => "+attrs.get(p)));
        // Or: if your put all your handling code in the predicate use stream.count();
    }
}

To similate the effect of File.list use a one level find scanner:要模拟File.list的效果，请使用一级find扫描器：

 BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> p.getNameCount() == dir.getNameCount()+1;

For a large folder scan you should clean up the attrs map as you go by inserting attrs.remove(p);对于大型文件夹扫描，您应该像 go 一样通过插入attrs.remove(p); after consuming the path.消耗路径后。

Edit编辑

The answer above can be refactored to a 3 line call returning stream of Map.Entry<Path, BasicFileAttributes> , or it's easy to add a class/record to hold the Path/BasicFileAttribute pair and return Stream<PathInfo> instead:上面的答案可以重构为返回Map.Entry<Path, BasicFileAttributes>的 stream 的 3 行调用，或者很容易添加一个类/记录来保存 Path/BasicFileAttribute 对并返回Stream<PathInfo> ：

/**
 * Call Files.find() returning a stream with both Path+BasicFileAttributes
 * as type Map.Entry<Path, BasicFileAttributes>
 * <p>Could declare a specific record to replace Map.Entry as:
 *    record PathInfo(Path path, BasicFileAttributes attr) { };
 */
public static Stream<Map.Entry<Path, BasicFileAttributes>>
find(Path dir, int maxDepth, BiPredicate<Path, BasicFileAttributes> matcher, FileVisitOption... options) throws IOException {

    HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();
    BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> (matcher == null || matcher.test(p, a)) && attrs.put(p, a) == null;

    return Files.find(dir, maxDepth, predicate, options).map(p -> Map.entry(p, attrs.remove(p)));
}

Answer 3

Starting from DuncG's answer I got the following to work in Scala3 as a pretty generic Akka Stream class.从 DuncG 的回答开始，我得到以下在 Scala3 中作为非常通用的 Akka Stream class 工作。 This is actually very neat as it creates a sideffect of the Files.find function which it immediately encapsulates back into a clean functional reactive stream.这实际上非常简洁，因为它创建了Files.find的副作用，它立即将其封装回一个干净的功能性反应式 stream。

class DirectoryList(
    dir: Path, 
    matcher: (Path, BasicFileAttributes) => Boolean = (p,a) => true, 
    maxDepth: Int = 1
) extends GraphStage[SourceShape[(Path,BasicFileAttributes)]]:
    import scala.jdk.FunctionConverters.*
    import scala.jdk.OptionConverters.*
    
    val out: Outlet[(Path,BasicFileAttributes)] = Outlet("PathAttributeSource")
    override val shape = SourceShape(out)


    override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
        new GraphStageLogic(shape) {
            private var next: (Path,BasicFileAttributes) = _

            def append(path: Path, att: BasicFileAttributes): Boolean = 
                val matched = matcher(path,att)
                if matched then next = (path,att)
                matched
            
            private val pathStream = Files.find(dir, maxDepth, append.asJava)
            private val sit = pathStream.iterator()
            
            setHandler(out, new OutHandler {
                override def onPull(): Unit = { 
                    if sit.hasNext then
                        sit.next()
                        push(out,next)
                    else
                        pathStream.close()  
                        complete(out)
                }

                override def onDownstreamFinish(cause: Throwable): Unit =
                    pathStream.close()  
                    super.onDownstreamFinish(cause)
            })
        }
end DirectoryList

This can then be used as follows:然后可以按如下方式使用：

val sourceGraph = DirectoryList(Path.of("."),depth=10)
val result = Source.fromGraph(sourceGraph).map{ (p: Path,att: BasicFileAttributes) => 
        println(s"received <$p> : dir=${att.isDirectory}")}.run()

The full source code is here on github and an initial test here .完整的源代码在 github和初始测试在这里。 Perhaps one could improve it by tuning the answers so that a certain number of path attribute pairs are passed along in bulk.也许可以通过调整答案来改进它，以便批量传递一定数量的路径属性对。

如何从文件系统中获取文件属性流？

问题描述

3 个解决方案

解决方案1
2 2021-03-18 22:59:47

解决方案2
2 已采纳 2021-03-19 08:33:45

解决方案3
1 2021-03-19 18:24:10

如何从文件系统中获取文件属性流？

问题描述

3 个解决方案

解决方案1 2 2021-03-18 22:59:47

解决方案2 2 已采纳 2021-03-19 08:33:45

解决方案3 1 2021-03-19 18:24:10

解决方案1
2 2021-03-18 22:59:47

解决方案2
2 已采纳 2021-03-19 08:33:45

解决方案3
1 2021-03-19 18:24:10