简体   繁体   English

获取文件列表并反复搜索文件列表的最快方法是什么?

[英]What is the fastest way to get lists of files and search through file lists repeatedly?

What is the fastest way to get lists of files and search through file lists repeatedly? 获取文件列表并反复搜索文件列表的最快方法是什么?

Situation: 情况:

  1. There can be 5,000 to 35,000 files spread over 3-5 root directories (that have many subdirectories) on a network drive. 网络驱动器上的3-5个根目录(具有许多子目录)中可能分布有5,000至35,000个文件。
  2. There are three file types (tif, sus, img) that a user may or may not search for. 用户可能会或可能不会搜索三种文件类型(tif,sus,img)。 Some file types can have 2-3 different file extensions. 某些文件类型可以具有2-3个不同的文件扩展名。
  3. From a list of files (in various database tables), I need to find out if each file exists, and if it does, save the path only and filename only to a table. 从文件列表(在各种数据库表中)中,我需要找出每个文件是否存在,如果存在,则仅将路径和文件名保存到表中。
  4. Searches on file names must be case sensitive but it would be nice to keep original case when saving the path to the table. 搜索文件名必须区分大小写,但是在保存表的路径时最好保留原始大小写。

Environment: 环境:

C# and .NET 4.0 on Windows PC. Windows PC上的C#和.NET 4.0。

Is this the fastest way?: 这是最快的方法吗?:

Is the fastest way to use a dictionary, with FileName as a key (lowercase) and Path as a value (original case)? 以FileName为键(小写)和Path为值(大写)使用字典的最快方法吗? In this way I can get the index/Path at the same pass when I search for the filename? 这样,当我搜索文件名时,可以同时获得索引/路径吗? The FileName and Path are split up front when populating the list. 填充列表时,文件名和路径在最前面。

if (d.TryGetValue("key", out value))
{
    // Log "key" and value to table  // only does one lookup
}

Note: I am a bit concerned that I probably will have duplicate key values per FileType. 注意:我有点担心每个FileType可能会有重复的键值。 When/If I run across this scenario what type of list and access method should I use? 当/如果我遇到这种情况,应该使用哪种类型的列表和访问方法?

Maybe on these rare cases, I should populate another list of the duplicate keys. 也许在这些罕见的情况下,我应该填充另一个重复键列表。 Because I will need to do at least one of: log/copy/delete of the files in any path. 因为我将需要执行以下至少一项操作:在任何路径下记录/复制/删除文件。

I would use a Dictionary<string,string> with the FullName (path+file+ext) changed to lower case as key and the FullName unchanged as value. 我将使用Dictionary<string,string> ,将FullName(path + file + ext)更改为小写字母作为键,并将FullName更改为值。 Then split the required parts using the static methods GetDirectoryName and GetFileName of the System.IO.Path class before inserting them into the table. 然后,在将它们插入表之前,使用System.IO.Path类的静态方法GetDirectoryNameGetFileName拆分所需的部分。

EDIT : The GetFiles method of the DirectoryInfo class returns an array of FileInfo . 编辑DirectoryInfo类的GetFiles方法返回一个FileInfo数组。 FileInfo has a FullName property returning path+file+ext. FileInfo具有FullName属性,该属性返回path + file + ext。 You could as well store this FileInfo as value in your dictionary if memory consumption is not an issue. 如果不消耗内存,那么也可以将此FileInfo作为值存储在字典中。 FileInfo has a DirectoryName and a Name property returning the two parts you need. FileInfo有一个DirectoryNameName属性,返回您需要的两个部分。

EDIT : Here is my implementation of a multimap which does the Directory<TKey,List<TValue>> stuff: 编辑 :这是我执行Directory<TKey,List<TValue>>的multimap的实现:

/// <summary>
/// Represents a collection of keys and values. Multiple values can have the same key.
/// </summary>
/// <typeparam name="TKey">Type of the keys.</typeparam>
/// <typeparam name="TValue">Type of the values.</typeparam>
public class MultiMap<TKey, TValue> : Dictionary<TKey, List<TValue>>
{

    public MultiMap()
        : base()
    {
    }

    public MultiMap(int capacity)
        : base(capacity)
    {
    }

    /// <summary>
    /// Adds an element with the specified key and value into the MultiMap. 
    /// </summary>
    /// <param name="key">The key of the element to add.</param>
    /// <param name="value">The value of the element to add.</param>
    public void Add(TKey key, TValue value)
    {
        List<TValue> valueList;

        if (TryGetValue(key, out valueList)) {
            valueList.Add(value);
        } else {
            valueList = new List<TValue>();
            valueList.Add(value);
            Add(key, valueList);
        }
    }

    /// <summary>
    /// Removes first occurence of a element with a specified key and value.
    /// </summary>
    /// <param name="key">The key of the element to remove.</param>
    /// <param name="value">The value of the element to remove.</param>
    /// <returns>true if the a element is removed; false if the key or the value were not found.</returns>
    public bool Remove(TKey key, TValue value)
    {
        List<TValue> valueList;

        if (TryGetValue(key, out valueList)) {
            if (valueList.Remove(value)) {
                if (valueList.Count == 0) {
                    Remove(key);
                }
                return true;
            }
        }
        return false;
    }

    /// <summary>
    /// Removes all occurences of elements with a specified key and value.
    /// </summary>
    /// <param name="key">The key of the elements to remove.</param>
    /// <param name="value">The value of the elements to remove.</param>
    /// <returns>Number of elements removed.</returns>
    public int RemoveAll(TKey key, TValue value)
    {
        List<TValue> valueList;
        int n = 0;

        if (TryGetValue(key, out valueList)) {
            while (valueList.Remove(value)) {
                n++;
            }
            if (valueList.Count == 0) {
                Remove(key);
            }
        }
        return n;
    }

    /// <summary>
    /// Gets the total number of values contained in the MultiMap.
    /// </summary>
    public int CountAll
    {
        get
        {
            int n = 0;

            foreach (List<TValue> valueList in Values) {
                n += valueList.Count;
            }
            return n;
        }
    }

    /// <summary>
    /// Determines whether the MultiMap contains a element with a specific key / value pair.
    /// </summary>
    /// <param name="key">Key of the element to search for.</param>
    /// <param name="value">Value of the element to search for.</param>
    /// <returns>true if the element was found; otherwise false.</returns>
    public bool Contains(TKey key, TValue value)
    {
        List<TValue> valueList;

        if (TryGetValue(key, out valueList)) {
            return valueList.Contains(value);
        }
        return false;
    }

    /// <summary>
    /// Determines whether the MultiMap contains a element with a specific value.
    /// </summary>
    /// <param name="value">Value of the element to search for.</param>
    /// <returns>true if the element was found; otherwise false.</returns>
    public bool Contains(TValue value)
    {
        foreach (List<TValue> valueList in Values) {
            if (valueList.Contains(value)) {
                return true;
            }
        }
        return false;
    }

}

I would probably use a dictionary with filename lowercased as key. 我可能会使用以小写文件名作为键的字典。 Value would be a class with the needed extra information. 价值将是具有所需额外信息的一类。 I would also search it like your example. 我也将像您的示例一样进行搜索。 If this was slow I would probably also try searching with linq just to see if it was faster. 如果速度很慢,我可能还会尝试使用linq进行搜索,以查看速度是否更快。 This is however one problem here; 但是,这是一个问题。 this requires that all files through all folders are uniquely named. 这要求所有文件夹中的所有文件都具有唯一的名称。 That might be the case for you, but it could also be a problem if you haven't already considered it ;) 对于您来说可能是这种情况,但是如果您还没有考虑过,这也可能是一个问题;)

Remember that you can also use a FileWatcher object to keep the memory dictionary/list synchronized with the disk contents if it is subject to change. 请记住,如果可以更改存储字典/列表,则还可以使用FileWatcher对象将其与磁盘内容同步。 If it's static I would probably store it all in a database table and search that instead, startup of your program would then be instatanious. 如果它是静态的,我可能会将其全部存储在数据库表中并进行搜索,那么程序的启动就不会发生变化。

Edit: Just now noticed your conscern for duplicates. 编辑:刚才注意到您担心重复。 If that's a problem I would create a List where fileclass is a class containing needed information on the files. 如果存在问题,我将创建一个列表,其中fileclass是一个包含有关文件所需信息的类。 Then search the list using linq as that could give you zero, one or more hits. 然后使用linq搜索列表,因为这样可能会给您带来零,一或多个匹配。 I think that would be more efficient than a dictionary with a list as value, where the list would contain one or more items (duplicates). 我认为这比以列表为值的字典更有效,在字典中列表包含一个或多个项目(重复项)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM