简体   繁体   中英

The fastest way to iterate through file system

Sometimes I need to traverse a folder recursively, reading the contents of all files within.

I use C++ and Linux.

The folder contents are arbitrary, from a billion of tiny files to a dozen of gargantuan ones.

Trying to achieve the highest reading speed, I ran into a dilemma. On one hand, it is almost always faster to perform all reading from one thread, because parallel access to the file system leads to head thrashing between concurrently read files: 在此处输入图像描述

On the other hand, sequential access to the file system from one thread is not as fast as it could be, for two reasons.

First, the time spent between completion of the previous read request and initiation of the next one is lost. I try to minimize it as much as I can by doing literally nothing in the reading thread aside from reading itself, but constant switching between user and kernel space it is still some time lost, especially in the aforementioned case of billions of tiny files. 在此处输入图像描述

Second, single-thread reading does not allow the kernel and/or the HDD controller to perform some reordering of the requested sectors, which could improve performance.

So, I would like to achieve two things:

1) In eg LibUsb, I can have several pending read requests, which are processed sequentially but with no pause between the completion of the previous request and initiation of the next one. Is it possible to get something like that for FS access?

在此处输入图像描述

2) Is it possible to submit several read requests to the kernel at the same time, but mark them in some way so that the kernel knows that these requests do not have individual deadlines and that it's the summary time of their cumulative execution that should be minimized?

在此处输入图像描述

Since you're using Linux, maybe you should give the new io_uring interface a try. It claims to be more efficient and performant than the traditional synchronous (thread-pool+blocking sycalls) or asynchronous libaio approaches.

For 1, the IORING_SETUP_SQPOLL flag of io_uring seems to do what you need as long as you keep pumping in requests.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM