简体   繁体   English

在Linux中存储和访问多达1000万个文件

[英]Storing & accessing up to 10 million files in Linux

I'm writing an app that needs to store lots of files up to approx 10 million. 我正在编写一个需要存储大约1000万个文件的应用程序。

They are presently named with a UUID and are going to be around 4MB each but always the same size. 它们目前以UUID命名,每个大约4MB,但总是相同。 Reading and writing from/to these files will always be sequential. 从/向这些文件读取和写入将始终是顺序的。

2 main questions I am seeking answers for: 我正在寻找2个主要问题的答案:

1) Which filesystem would be best for this. 1)哪种文件系统最适合这种情况。 XFS or ext4? XFS还是ext4? 2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory? 2)是否有必要将文件存储在子目录下以减少单个目录中的文件数量?

For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. 对于问题2,我注意到人们已经尝试发现可以存储在单个目录中的文件数量的XFS限制,并且没有找到超过数百万的限制。 They noted no performance problems. 他们注意到没有性能问题。 What about under ext4? 在ext4下怎么样?

Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). 在人们做类似事情时,有些人建议将inode编号存储为文件的链接而不是文件的性能(这是在数据库索引中。我也在使用)。 However, I don't see a usable API for opening the file by inode number. 但是,我没有看到用于按inode编号打开文件的可用API。 That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way. 这似乎更像是在ext3下提高性能的建议,我不打算顺便使用它。

What are the ext4 and XFS limits? ext4和XFS限制是什么? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case? 从一个到另一个有什么性能优势,你能看到在我的情况下使用ext4而不是XFS的理由吗?

You should definitely store the files in subdirectories. 您绝对应该将文件存储在子目录中。

EXT4 and XFS both use efficient lookup methods for file names, but if you ever need to run tools over the directories such as ls or find you will be very glad to have the files in manageable chunks of 1,000 - 10,000 files. EXT4和XFS都使用高效的文件名查找方法,但是如果您需要在ls等目录上运行工具或者find您将很高兴将文件保存在1,000到10,000个文件的可管理块中。

The inode number thing is to improve the sequential access performance of the EXT filesystems. inode号是为了提高EXT文件系统的顺序访问性能。 The metadata is stored in inodes and if you access these inodes out of order then the metadata accesses are randomized. 元数据存储在inode中,如果您不按顺序访问这些inode,则元数据访问将被随机化。 By reading your files in inode order you make the metadata access sequential too. 通过以inode顺序读取文件,您也可以按顺序访问元数据。

Modern filesystems will let you store 10 million files all in the same directory if you like. 如果您愿意,现代文件系统将允许您将1000万个文件存储在同一目录中。 But tools (ls and its friends) will not work well. 但工具(ls及其朋友)将无法正常工作。

I'd recommend putting a single level of directories, a fixed number, perhaps 1,000 directories, and putting the files in there (10,000 files is tolerable to the shell, and "ls"). 我建议放一个级别的目录,一个固定的数字,可能是1000个目录,并将文件放在那里(10,000个文件可以容忍shell,“ls”)。

I've seen systems which create many levels of directories, this is truly unnecessary and increases inode consumption and makes traversal slower. 我已经看到了创建多级目录的系统,这确实是不必要的,增加了inode消耗并使遍历变慢。

10M files should not really be a problem either, unless you need to do bulk operations on them. 10M文件也不应该是一个问题,除非你需要对它们进行批量操作。

I expect you will need to prune old files, but something like "tmpwatch" will probably work just fine with 10M files. 我希望你需要修剪旧文件,但像“tmpwatch”这样的东西可能适用于10M文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM