简体   繁体   English

索引文件和在文件夹中查找文件的最快捷方式?

[英]Indexing files and quickiest way to find a file in folders?

I have 660000 xml files(with unique file names) in 22 folders. 我在22个文件夹中有660000个xml文件(带有唯一的文件名)。 Each folder has 30000 files. 每个文件夹有30000个文件。 I need to find them by their names efficiently in a C# application. 我需要在C#应用程序中有效地找到它们的名字。 I know there is a SearchIndexer service in Windows(?Vista+?) and I was just wondering if I can use that or I have to index the files myself? 我知道Windows中有一个SearchIndexer服务(?Vista +?),我只是想知道我是否可以使用它,或者我必须自己索引文件?

Alternatively, I guess I could create a database with the file name being the primary key and path in another column. 或者,我想我可以创建一个数据库,文件名是主键,另一列是路径。 However, should I create one table with 660000 rows in it or 22 tables with 30000 rows each? 但是,我应该创建一个包含660000行的表,还是每个包含30000行的22个表? And Why? 为什么?

Thanks in advance. 提前致谢。

My experience on this may be dated (NTFS), but you should check how quickly you can open a file in a directory of 30,000 files. 我对此的体验可能过时(NTFS),但您应该检查在30,000个文件的目录中打开文件的速度。 I think you might find that it's better to distribute the files over more directories. 我想你可能会发现将文件分发到更多目录更好。

If you have control over the directory layout, consider hashing the file names to a number between 0 and 660000. You can then use the file system as an index: 如果您可以控制目录布局,请考虑将文件名散列为0到660000之间的数字。然后,您可以将文件系统用作索引:

00/
  00/
    <99 files that hash here>
..
65

You still need to write a simple "indexer" that reads each file, computes it's hash and stores it in the correct location. 您仍然需要编写一个简单的“索引器”来读取每个文件,计算它的哈希并将其存储在正确的位置。 You then lookup a file as: 然后,您将文件查找为:

Lookup(string filename)
{
   int hash = filename.GetHashCode() % 660000;
   string directory = HashToDirectory(hash);
   string path = Path.Combine(directory, filename);
   ...

One thing that's nice about this approach is that you can profile various "densities" for the number of files in a directory. 这种方法的一个好处是,您可以为目录中的文件数量分析各种“密度”。 You just change the HashToPath function. 您只需更改HashToPath功能。 You also don't need a database. 您也不需要数据库。

We used a similar approach with a web crawler that stored a lot of files. 我们使用类似的方法与存储大量文件的网络爬虫。 It was against NTFS, so YMMV. 它是针对NTFS的,所以YMMV。

Querying the Index Programmatically : 以编程方式查询索引

  • Using SQL and AQS Approaches to Query the Index 使用SQL和AQS方法查询索引
  • Querying the Index with ISearchQueryHelper 使用ISearchQueryHelper查询索引
  • Querying the Index with the search-ms Protocol 使用search-ms协议查询索引
  • Querying the Index with Windows Search SQL Syntax 使用Windows搜索SQL语法查询索引
  • Using Advanced Query Syntax Programmatically 以编程方式使用高级查询语法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用.Net在两个文件夹中查找相同的文件 - Find identical files in two folders with .Net 获取文件和文件夹网址的有效方法 - Effective way to get urls of files and folders 在所有可能的文件夹中查找文件? - Find a file within all possible folders? 按文件夹和文件通配符获取文件和文件夹 - Get Files and Folders by Folder and File wildcard 如何将文件链接/绑定到文件系统上的文件夹 - How to link/bind files to folders on the file system 通过交叉检查在两个文件夹中查找新文件 - Find new file in two folders with a cross check 有没有一种方法可以自定义OpenFileDialog以选择文件夹而不是文件? - Is there a way to customize the OpenFileDialog to select folders instead of a file? 如何将文件从文件夹和子文件夹传输到 C# 中由文件名模式创建的特定文件夹(File2specific Folders) - How to transfer files form folders and sub-folders to specific folders created by file name pattern in C# (File2specific Folders) 如何通过C#中的WMI根据不同的扩展名查找C驱动器的文件和子文件夹中的文件路径列表 - How to find list of file path in files and in sub folders of C Drive based on different extension through WMI in C# 如何按部分名称c#查找文件夹和文件 - How to find folders and files by its partial name c#
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM