简体   繁体   English

在c#中对250万条内存中的记录进行排序的最佳方法是什么?

[英]What's the best way to sort about 2.5 million records in memory in c#?

Consider I have a class 考虑我有课

class Employee
{
    public string Id { get; set; }
    public string Type { get; set; }
    public string Identifier { get; set; }
    public object Resume { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
}
List<Employee> employees = LoadEmployees(); //Around 2.5 million to 3 millions employees
employees = employees
                .Where(x => x.Identifier != null)
                .OrderBy(x => x.Identifier)
                .ToArray();

I have a requirement where I want to load and sort around 2.5 million employees in memory but the Linq query gets stuck on the OrderBy clause. 我有一个要求,我想在内存中加载和排序约250万名员工,但是Linq查询卡在OrderBy子句中。 Any pointers on this? 关于这个有什么建议吗? I have created this Employee class just to simplify my problem. 我创建了这个Employee类只是为了简化我的问题。

I would use the .Where(x => x.Identifier != null) clause first, since it filters some data first and then do the OrderBy . 我将首先使用.Where(x => x.Identifier != null)子句,因为它首先过滤一些数据,然后执行OrderBy Given the fact that you have only ~2.5 million records and that they are only basic types like string and DateTime , then you should not have any problems with the memory in this case. 考虑到您只有约250万条记录,并且它们只是基本类型(例如stringDateTime ,因此在这种情况下,内存应该没有任何问题。

Edit: 编辑:

I have just ran your code as a sample and indeed it is a matter of seconds (like over 15 seconds on my machine which does not have a very powerful CPU, but still, it does not get stuck): 我只是将您的代码作为示例运行,实际上只需几秒钟(例如在我的机器上超过15秒,它没有非常强大的CPU,但仍然不会卡住):

List<Employee> employees = new List<Employee>();
for(int i=0;i<2500000;i++)
{
    employees.Add(new Employee
    {
        Id = Guid.NewGuid().ToString(),
        Identifier = Guid.NewGuid().ToString(),
        Type = i.ToString(),
        StartDate = DateTime.MinValue,
        EndDate = DateTime.Now
    });
}

var newEmployees = employees
    .Where(x => x.Identifier != null)
    .OrderBy(x => x.Identifier)
    .ToArray();

As a second edit, I have just ran some tests, and it seems that an implementation using Parallel Linq can be in some cases faster with about 1.5 seconds than the serial implementation: 作为第二个编辑,我刚刚进行了一些测试,似乎在某些情况下,使用Parallel Linq的实现可能比串行实现快1.5秒左右:

var newEmployees1 = employees.AsParallel()
    .Where(x => x.Identifier != null)
    .OrderBy(x => x.Identifier)
    .ToArray();

And these are the best numbers that I got: 这些是我得到的最好的数字:

7599 //serial implementation
5752 //parallel linq

But the parallel tests could variate from one machine to another so I suggest making some tests yourself and if you still find a problem about this, then maybe edit the question/post another one. 但是并行测试可能会因一台机器而异,因此我建议您自己进行一些测试,如果仍然发现问题,则可以编辑问题/发布另一台。

Using the hint that @Igor proposed in the comment below, the parallel implementation with StringComparer.OrdinalIgnoreCase is about three times faster than the simple parallel implementation. 使用@Igor在下面的注释中提出的提示,使用StringComparer.OrdinalIgnoreCase的并行实现比简单的并行实现快大约三倍。 The final (fastest) code looks like this: 最终(最快)的代码如下所示:

var employees = employees.AsParallel()
    .Where(x => x.Identifier != null)
    .OrderBy(x => x.Identifier, StringComparer.OrdinalIgnoreCase)
    .ToArray();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从C#console app向Oracle输入大约一百万条记录的有效方法是什么? - What is an efficent way to insert about a million records into Oracle from C# console app? 用C#对象查找进程的内存分配的最佳方法是什么 - What is the best way to find a process's memory allocations in terms of C# objects LINQ中排序1000000条记录的最佳方法是什么 - What is the best way in LINQ to sort 1000000 Records 用C#生成具有百万条记录的xml文件的最快方法 - Fastest way to generate an xml file with million records in c# 在 C# 中,解析并按时间字符串排序的最佳方法是什么? - In C#, what is the best way to Parse out and sort by time string? webrequest c#百万条记录 - webrequest c# million records 在C#中使用内存中文件的最佳方法是什么? - What is the best way to work with files in memory in C#? 衡量C#函数的内存使用率的最佳方法是什么? - What is the best way to measure the memory usage of a C# function? 在 c# 中将 100 万个对象列表与另一个 100 万个对象列表进行比较的最佳方法 - Best Way to compare 1 million List of object with another 1 million List of object in c# 在C#中进行冒泡排序的最优雅方法是什么? - What's the most elegant way to bubble-sort in C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM