简体   繁体   中英

Compare two List<POCO> to find differences being case insensitive

I have two collections:

private void ProcessCollectionsData(List<OrganizationUser> databaseUsers, List<OrganizationUser> importedUsers) { ... }

There is a property called UserIdentifier (String) and UserId (Int32). This is how I'm comparing them which is producing wrong results and heavy performance bottlenecks:

LogMessage(new LogEntry(" - Generating delta for new users...", true));

Task.WaitAll(Task.Run(() =>
{
    newUsers = databaseUsers.Any() ? importedUsers.Where(x => !databaseUsers.Select(y => y.UserIdentifier.ToLower())
                                                      .ToList()
                                                      .Contains(x.UserIdentifier.ToLower()))
                                                  .ToList()
                                   : importedUsers;

    duplicates = newUsers.OrderByDescending(o1 => o1.UserId)
                         .GroupBy(s => s.UserIdentifier, StringComparer.InvariantCultureIgnoreCase)
                         .Where(y => y.Count() > 1);

    foreach (var item in duplicates)
    {
        newUsers.RemoveAll(s => string.Equals(s.UserIdentifier, item.Key, StringComparison.OrdinalIgnoreCase));
        newUsers.Add(item.First());
    }
}));

LogMessage(new LogEntry(String.Format(" - Done. New users to be imported: {0}", newUsers.Count)));

The data in the importedUsers comes from a CSV and can be duplicated and also duplicated with mix case for UserIdentifier field. The data in databaseUsers is empty first time around. Then, after first run, the import file dumps around a 100,000 users to database and at second and consecutive runs, the databaseUsers is loaded with 100,000 existing users and importedUsers also brings data in the range of 99,990 to 100,100 (example) which requires me to generate delta collections so that I know which users to mark delete, which to add (new) and remaining (common) needs to be updated.

Can anyone suggest a faster way to do this?

I can see I'm making a mistake where I'm assigning to the newUser collection by using ToLower()

Correction to above statement, the resulting newUsers collection retains case information as desired. So the performance is the real issue here now.

I think the crux of your performance problem is here

!databaseUsers.Select(y => y.UserIdentifier.ToLower()).ToList()

Given databaseUsers can contain 100,000 users you certainly don't want to be pulling that entire list into memory. Getting rid of the Select / ToList calls should mean you only query the DB which should make a difference

importedUsers.Where(x => !databaseUsers.Any(y => 
    y.UserIdentifier.ToLower() == x.UserIdentifier.ToLower()).ToList()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM