简体繁体 English

为什么Linux中的readdir（）调用非线性增长

[英]Why readdir() call in linux grows non-linearly

原文 2014-11-13 10:58:28 0 4 c/ linux

I have a directory with 1000 files and readdir() takes less than 1 sec, but 10000 files take around 24 sec. 我有一个包含1000个文件的目录，而readdir()不到1秒，但是10000个文件大约需要24秒。

Why? 为什么？ It should be linear. 它应该是线性的。

Can anyone explain the reason. 谁能解释原因。 And is there a better solution if only I need is to get the file and sub-directory names in a directory? 如果仅需要获取目录中的文件和子目录名称，是否有更好的解决方案？

EDIT I am on my local linux pc. 编辑我在我的本地Linux PC上。

4 个解决方案

It might be file system specific. 它可能是特定于文件系统的。 Perhaps using a suitably configured Ext4 or BTRFS file system should help. 也许使用适当配置的Ext4或BTRFS文件系统应该有所帮助。 Some file systems are using hashing or B-tree techniques to make the complexity of file access in a directory of size N be O(log N) , others are still linear eg O(N) , and the kernel might do weird things above that. 一些文件系统使用散列或B树技术使大小为N的目录中文件访问的复杂度为O（log N） ，而另一些仍然是线性的，例如O（N） ，并且内核可能在此之上做一些奇怪的事情。

The shell that you might use in your huge directories will generally sort entries when globbing (see also glob(7) ). 您可能在大型目录中使用的shell在进行glob时通常会对条目进行排序（另请参见glob（7））。 And you don't want its auto-completion to last many seconds on each keystroke! 而且您不希望它的自动完成功能在每次击键时都持续数秒钟！

I believe that you should never have huge directories (eg with more than a few hundred entries), so 10000 files in a single directory is unreasonable. 我相信您永远都不会拥有巨大的目录（例如，条目超过几百个），因此单个目录中的10000个文件是不合理的。 If that is the case, you'll better organize your files differently, eg subdir01/file001.txt ... sbudir99/file999.txt 如果是这种情况，则最好以不同的方式组织文件，例如subdir01/file001.txt ... sbudir99/file999.txt

BTW, if your need is to have a lot of small things accessible by some textual key, using an indexed file (like gdbm ) or a Sqlite "database", or a real database ( PostGreSQL , MongoDb ...) is much more suitable, and probably more efficient. 顺便说一句，如果您需要通过文本键访问许多小东西，则使用索引文件（如gdbm ）或Sqlite “数据库”，或真实的数据库（ PostGreSQL ， MongoDb ...）更适合，而且效率可能更高。 Don't forget to dump the data (probably in some textual format) for backup. 不要忘记转储数据（可能是以某种文本格式）以进行备份。

Notice that the documentation of readdir(3) on Linux, and of POSIX readdir do not mention any time complexity or any linear behavior. 请注意，Linux上的readdir（3）和POSIX readdir的文档没有提及任何时间复杂度或任何线性行为。 This lack of mention is significant. 缺乏提及是很重要的。 On the commonly used FAT filesystem (eg on many USB keys) the time complexity is probably quadratic. 在常用的FAT文件系统上（例如在许多USB密钥上），时间复杂度可能是二次的。

It has no reason to be linear. 它没有理由是线性的。 At lower level, a directory is like a file, a collection of clusters. 在较低级别，目录就像文件一样，是群集的集合。 If it is contained in one single cluster, you have only one actual physical read operation, the remaining occurs in memory. 如果它包含在一个群集中，则您只有一个实际的物理读取操作，其余的发生在内存中。 But when you directory becomes excessively large, you will have many physical reads. 但是，当目录太大时，您将进行许多物理读取。 At this moment, as stated by Basile Starynkevitch, it becomes highly dependent on the file system structure. 正如Basile Starynkevitch所说，此刻它高度依赖于文件系统结构。

But IMHO, if you want to browse the directory, it depends essentially on the number of clusters used by the directory. 但是恕我直言，如果您要浏览目录，它实际上取决于目录使用的群集数量。 It is much more implementation dependant when you directly look for a file (by name) in a huge directory. 当您直接在一个巨大的目录中查找文件（按名称）时，它更多地取决于实现。 Filsystems with linear search will have worse results than filesystems using natively hashing like for example BSD FFS. 与使用本地哈希（例如BSD FFS）的文件系统相比，具有线性搜索功能的文件系统的结果将更糟。

All operations should be linear on a poor filesystem (eg FAT/FAT32 are O(N) ). 在较差的文件系统上，所有操作都应是线性的（例如FAT / FAT32为O(N) ）。
Seeks, updates and deletes should be better than linear on a good filesystem like NTFS which is O(log N) . 在像OFS这样的良好文件系统上，查找，更新和删除应该比线性更好O(log N) 。 A full directory listing will still be linear though. 一个完整的目录列表仍然是线性的。
In either case it should be much, much faster than what you have reported in both the small and large cases. 无论哪种情况，它都应该比您在小型和大型情况下报告的速度快得多。

I suspect something else is going on. 我怀疑还有其他情况。 Very likely your results are biased by other factors than the directory structure, such as: 您的结果很可能受到目录结构以外的其他因素的影响，例如：

Disk has a hardware problem which is triggered in the large example but not the small one 磁盘有一个硬件问题，该问题在大型示例中触发，但在小型示例中未触发
Other disk activity from other parts of the system interrupts the test in the large case 在较大的情况下，系统其他部分的其他磁盘活动会中断测试
Disk hardware pre-fetching. 磁盘硬件预取。 Disks contain RAM caches which will try to predict which sectors will be requested next, and have them ready. 磁盘包含RAM高速缓存，这些高速缓存将尝试预测下一个将要请求的扇区，并准备好它们。
Operating system cache. 操作系统缓存。 Operating systems will also cache data in a similar way. 操作系统还将以类似方式缓存数据。
You are possibliy doing something with the data other than just readdir and this other operation has higher time complexity which dominates. 您不仅可以使用readdir数据，而且该其他操作具有较高的时间复杂度，这是主要问题。
Your application memory usage pattern is able to fit into L1 cache for small directories but not large ones. 您的应用程序内存使用模式能够适合小型目录的L1缓存，但不适用于大型目录。
Your application memory usage pattern forces swapping on large directories but not small ones. 您的应用程序内存使用模式会强制在大型目录上进行交换，但不能在小型目录上进行交换。

readdir is at best linear. readdir最多是线性的。 If we ignore everything that goes on in the filesystem, the amount of data (file names and other stuff in struct dirent) from the kernel into userland is directly proportional to the number of files. 如果我们忽略文件系统中发生的所有事情，则从内核到用户空间的数据量（文件名和struct dirent中的其他内容）与文件数成正比。 So we start with O(n). 因此，我们从O（n）开始。

Then the kernel needs to figure out which data to give you. 然后内核需要找出要提供的数据。 At best it is linearly stored in something that looks like a file. 充其量线性存储在看起来像文件的东西中。 This is what older file systems like FFS and EXT2 do. 这就是FFS和EXT2等较旧的文件系统所做的。 This gives good performance for readdir (because finding which disk block to give you is just an array lookup), but has the disadvantage that actually opening those files ( open , stat or almost anything else that works with the file name) becomes an O(n) operation because every open has to linearly scan the directory to find the file name. 这为readdir提供了良好的性能（因为找到要提供给您的磁盘块只是一个数组查找），但缺点是实际上打开这些文件（ open ， stat或几乎所有与该文件名一起使用的其他文件）都会变成O（ n）操作，因为每次打开都必须线性扫描目录以找到文件名。 This is why there has been so much work in caching directory data for those file systems. 这就是为什么在缓存这些文件系统的目录数据方面进行大量工作的原因。 Even on those filesystems you might end up seeing that larger directories take longer to read per item because the way file information is stored gets more expensive with file size. 即使在那些文件系统上，您可能最终也会看到较大的目录需要较长的时间才能读取每个项目，因为文件信息的存储方式随文件大小的增加而变得昂贵。 Depending on your file (or directory) size the kernel might need to read between 1 and 5 other blocks from disk (or cache) to find out which block to give you. 根据文件（或目录）的大小，内核可能需要从磁盘（或缓存）中读取1到5个其他块，以找出给您的块。

If you have a different filesystem (most modern ones), they trade the convenience and speed of a linear directory for a more complex structure on disk which gives you a much better performance of open and stat (after all, why would you readdir if you don't intend to do anything with the files?), but as a consequence you end up (not necessarily, but most likely) with worse than linear time to actually perform a readdir because the operation to find out which disk block to read for your information might be O(log n). 如果你有一个不同的文件系统（最现代的），他们交易的线性目录的方便性和速度的磁盘上的一个更复杂的结构，让你的更好的性能open和stat （毕竟，你为什么会readdir如果你不想对文件做任何事情？），但是结果是您（不一定，但很可能）比线性时间差，实际执行了一个readdir因为该操作找出了要读取哪个磁盘块您的信息可能是O（log n）。