简体繁体 English

从一个目录中读取许多小文件有多大问题？

[英]How problematic is it to read many small files from one directory?

原文 2016-09-12 12:02:01 3 1 c++/ linux/ windows

I have to read many (up to 5 mio.) small (9 KB) files. 我必须阅读许多（最多5 mio。）小（9 KB）文件。 At the moment they are all in one directory. 目前，他们都在一个目录中。 I fear this will take quadratic time or even n^2 log n for look up, is this right? 我担心这会花费二次时间甚至n ^ 2 log n来查找，这是对的吗？ Is this significant (will the lookup take more time than the actual reading)? 这是否重要（查找是否需要比实际读数更多的时间）？ Is there a difference in the asymptotic behavior of the running time when the file are cached by the OS? 操作系统缓存文件时，运行时间的渐近行为是否存在差异？

I use C++-streams for reading the files. 我使用C ++ - 流来读取文件。 At the moment I'm using Windows 7 with NTFS, but I will later run the program on a linux cluster (not sure which file system). 目前我正在使用带有NTFS的Windows 7，但我稍后会在linux集群上运行该程序（不确定是哪个文件系统）。

1 个解决方案

It might not be that bad : if you enumerate files, and process each filename as you encounter it, your OS is quite likely to have the directory entry in its disk cache. 它可能没有那么糟糕：如果你枚举文件，并在遇到文件时处理每个文件名，你的操作系统很可能在其磁盘缓存中有目录条目。 And for practical purposes, a disk cache is O(1). 并且出于实际目的，磁盘高速缓存是O（1）。

What will kill you is a mechanical HDD. 什么会杀了你是一个机械硬盘。 You'll have 5 million disk seeks, each of which takes ~1/100th of a second. 你将有500万次磁盘搜索，每次搜索需要大约1/100秒。 That is 50.000 seconds, more than a half day. 那是50.000秒，超过半天。 This is a task that screams for an SSD. 这是一个尖叫SSD的任务。

如何从OpenCV中的目录顺序读取文件？ - How to read files in sequence from a directory in OpenCV?

C++如何一一读取目录下的所有文件 - How to read all files of directory one by one in C++

如何读取文件目录 - How to read in a directory of files

如何在Linux上的目录中读取文件？ - How to read files in a directory on Linux?

如何以独立于平台的方式从 C++ 中的目录中读取文件？ - How to read files from a directory in c++ in a platform independent way?

如何从OpenCV中的目录顺序读取文件并将其用于处理？ - How to read files in sequence from a directory in OpenCV and use it for processing?

如何在C ++中将ASCII文本文件中的（小）整数读取为足够的数据数组 - How to read (small) integers from ASCII text files into adequate data arrays in C++

如何使用线程将内容从一个文件复制到多个文件 - How to copy content from one file to many files using threads

Qt从资源包目录中读取文件 - Qt read files from Resources Bundle Directory

许多小文件或一个大文件？（或者，打开和关闭文件句柄的开销）（C ++） - Many small files or one big file? (Or, Overhead of opening and closing file handles) (C++)

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从OpenCV中的目录顺序读取文件？ - How to read files in sequence from a directory in OpenCV? C++如何一一读取目录下的所有文件 - How to read all files of directory one by one in C++ 如何读取文件目录 - How to read in a directory of files 如何在Linux上的目录中读取文件？ - How to read files in a directory on Linux? 如何以独立于平台的方式从 C++ 中的目录中读取文件？ - How to read files from a directory in c++ in a platform independent way? 如何从OpenCV中的目录顺序读取文件并将其用于处理？ - How to read files in sequence from a directory in OpenCV and use it for processing? 如何在C ++中将ASCII文本文件中的（小）整数读取为足够的数据数组 - How to read (small) integers from ASCII text files into adequate data arrays in C++ 如何使用线程将内容从一个文件复制到多个文件 - How to copy content from one file to many files using threads Qt从资源包目录中读取文件 - Qt read files from Resources Bundle Directory 许多小文件或一个大文件？（或者，打开和关闭文件句柄的开销）（C ++） - Many small files or one big file? (Or, Overhead of opening and closing file handles) (C++)

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM