I have two collections:
private void ProcessCollectionsData(List<OrganizationUser> databaseUsers, List<OrganizationUser> importedUsers) { ... }
There is a property called UserIdentifier (String) and UserId (Int32). This is how I'm comparing them which is producing wrong results and heavy performance bottlenecks:
LogMessage(new LogEntry(" - Generating delta for new users...", true));
Task.WaitAll(Task.Run(() =>
{
newUsers = databaseUsers.Any() ? importedUsers.Where(x => !databaseUsers.Select(y => y.UserIdentifier.ToLower())
.ToList()
.Contains(x.UserIdentifier.ToLower()))
.ToList()
: importedUsers;
duplicates = newUsers.OrderByDescending(o1 => o1.UserId)
.GroupBy(s => s.UserIdentifier, StringComparer.InvariantCultureIgnoreCase)
.Where(y => y.Count() > 1);
foreach (var item in duplicates)
{
newUsers.RemoveAll(s => string.Equals(s.UserIdentifier, item.Key, StringComparison.OrdinalIgnoreCase));
newUsers.Add(item.First());
}
}));
LogMessage(new LogEntry(String.Format(" - Done. New users to be imported: {0}", newUsers.Count)));
The data in the importedUsers
comes from a CSV and can be duplicated and also duplicated with mix case for UserIdentifier
field. The data in databaseUsers
is empty first time around. Then, after first run, the import file dumps around a 100,000 users to database and at second and consecutive runs, the databaseUsers
is loaded with 100,000 existing users and importedUsers
also brings data in the range of 99,990 to 100,100 (example) which requires me to generate delta collections so that I know which users to mark delete, which to add (new) and remaining (common) needs to be updated.
Can anyone suggest a faster way to do this?
I can see I'm making a mistake where I'm assigning to the newUser
collection by using ToLower()
Correction to above statement, the resulting newUsers
collection retains case information as desired. So the performance is the real issue here now.
I think the crux of your performance problem is here
!databaseUsers.Select(y => y.UserIdentifier.ToLower()).ToList()
Given databaseUsers
can contain 100,000 users you certainly don't want to be pulling that entire list into memory. Getting rid of the Select
/ ToList
calls should mean you only query the DB which should make a difference
importedUsers.Where(x => !databaseUsers.Any(y =>
y.UserIdentifier.ToLower() == x.UserIdentifier.ToLower()).ToList()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.