简体   繁体   English

比较包含大量对象的两个列表

[英]Compare two lists that contain a lot of objects

I need to compare two lists where each list contains about 60,000 objects. 我需要比较两个列表,其中每个列表包含大约60,000个对象。 what would be the most efficient way of doing this? 这样做最有效的方法是什么? I want to select all the items that are in the source list that do not exist in the destination list. 我想选择源列表中目标列表中不存在的所有项目。

I am creating a sync application where c# scans a directory and places the attributes of each file in a list. 我正在创建一个同步应用程序,其中c#扫描目录并将每个文件的属性放在列表中。 therefore there is a list for the source directory and another list for the destination directory. 因此,有一个源目录列表和目标目录的另一个列表。 Then instead of copying all the files I will just compare the list and see which ones are different. 然后,我只是比较列表,看看哪些不同,而不是复制所有文件。 If both list have the same file then I will not copy that file. 如果两个列表都有相同的文件,那么我将不会复制该文件。 Here is the Linq query that I use and it works when I scan a small folder but it does not when I scan a large folder. 这是我使用的Linq查询,它在扫描小文件夹时有效,但在扫描大文件夹时却没有。

// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
  var q = from a in s.lstFiles
        from b in d.lstFiles
        where
        a.compareName == b.compareName &&
        a.size == b.size &&
        a.dateCreated == b.dateCreated
        select a;

// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();

foreach (Classes.MyPathInfo.MyFile file in q)
{
    tempList.Add(file);
}

I don't know why this query takes forever. 我不知道为什么这个查询需要永远。 Also there are other things that I can take advantage. 还有其他我可以利用的东西。 For example, I know that if the source file matches a destination file, then it is impossible to have another duplicate with that file because it is not possible to have to file names with the same name and same path. 例如,我知道如果源文件与目标文件匹配,则不可能与该文件有另一个副本,因为不可能具有相同名称和相同路径的文件名。

LINQ has an Except() method for this purpose. 为此,LINQ有一个Except()方法。 You can just use a.Except(b); 你可以使用a.Except(b);

Create an equality comparer for the type, then you can use that to efficiently compare the sets: 为该类型创建一个相等比较器,然后您可以使用它来有效地比较这些集:

public class MyFileComparer : IEqualityComparer<MyFile> {

  public bool Equals(MyFile a, MyFile b) {
    return
      a.compareName == b.compareName &&
      a.size == b.size &&
      a.dateCreated == b.dateCreated;
  }

  public int GetHashCode(MyFile a) {
    return
     (a.compareName.GetHashCode() * 251 + a.size.GetHashCode()) * 251 +
      a.dateCreated.GetHashCode();
  }

}

Now you can use this with methods like Intersect to get all items that exist in both lists, or Except to get all items that exist in one list but not the other: 现在,您可以将此方法与Intersect方法一起使用以获取两个列表中存在的所有项目,或者Except获取一个列表中存在但不存在另一个列表的所有项目:

List<MyFile> tempList =
  s.lstFiles.Intersect(d.lstFiles, new MyFileComparer()).ToList();

As the methods can use the hash code to divide the items into buckets, there are a lot less comparisons that needs to be done compared to a join where it has to compare all items in one list to all items in the other list. 由于这些方法可以使用哈希代码将项目划分为存储桶,因此与连接相比,需要进行的比较要少得多,因为连接必须将一个列表中的所有项目与另一个列表中的所有项目进行比较。

使用Except()并阅读有关使用linq的 set操作和使用HashSet设置操作的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM