Consider I have a class
class Employee
{
public string Id { get; set; }
public string Type { get; set; }
public string Identifier { get; set; }
public object Resume { get; set; }
public DateTime StartDate { get; set; }
public DateTime EndDate { get; set; }
}
List<Employee> employees = LoadEmployees(); //Around 2.5 million to 3 millions employees
employees = employees
.Where(x => x.Identifier != null)
.OrderBy(x => x.Identifier)
.ToArray();
I have a requirement where I want to load and sort around 2.5 million employees in memory but the Linq query gets stuck on the OrderBy
clause. Any pointers on this? I have created this Employee
class just to simplify my problem.
I would use the .Where(x => x.Identifier != null)
clause first, since it filters some data first and then do the OrderBy
. Given the fact that you have only ~2.5 million records and that they are only basic types like string
and DateTime
, then you should not have any problems with the memory in this case.
Edit:
I have just ran your code as a sample and indeed it is a matter of seconds (like over 15 seconds on my machine which does not have a very powerful CPU, but still, it does not get stuck):
List<Employee> employees = new List<Employee>();
for(int i=0;i<2500000;i++)
{
employees.Add(new Employee
{
Id = Guid.NewGuid().ToString(),
Identifier = Guid.NewGuid().ToString(),
Type = i.ToString(),
StartDate = DateTime.MinValue,
EndDate = DateTime.Now
});
}
var newEmployees = employees
.Where(x => x.Identifier != null)
.OrderBy(x => x.Identifier)
.ToArray();
As a second edit, I have just ran some tests, and it seems that an implementation using Parallel Linq can be in some cases faster with about 1.5 seconds than the serial implementation:
var newEmployees1 = employees.AsParallel()
.Where(x => x.Identifier != null)
.OrderBy(x => x.Identifier)
.ToArray();
And these are the best numbers that I got:
7599 //serial implementation
5752 //parallel linq
But the parallel tests could variate from one machine to another so I suggest making some tests yourself and if you still find a problem about this, then maybe edit the question/post another one.
Using the hint that @Igor proposed in the comment below, the parallel implementation with StringComparer.OrdinalIgnoreCase
is about three times faster than the simple parallel implementation. The final (fastest) code looks like this:
var employees = employees.AsParallel()
.Where(x => x.Identifier != null)
.OrderBy(x => x.Identifier, StringComparer.OrdinalIgnoreCase)
.ToArray();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.