简体   繁体   English

C#等价于C ++ std :: partial_sort吗?

[英]Is there a C# equivalent to C++ std::partial_sort?

I'm trying to implement a paging algorithm for a dataset sortable via many criteria. 我正在尝试通过许多标准为可排序的数据集实现分页算法。 Unfortunately, while some of those criteria can be implemented at the database level, some must be done at the app level (we have to integrate with another data source). 不幸的是,虽然其中一些标准可以在数据库级别实现,但有些必须在应用程序级别完成(我们必须与另一个数据源集成)。 We have a paging (actually infinite scroll) requirement and are looking for a way to minimize the pain of sorting the entire dataset at the app level with every paging call. 我们有一个分页(实际上是无限滚动)的要求,并且正在寻找一种方法来最小化在每个分页调用时在应用程序级别对整个数据集进行排序的痛苦。

What is the best way to do a partial sort, only sorting the part of the list that absolutely needs to be sorted? 进行部分排序的最佳方法是什么,只排序绝对需要排序的列表部分? Is there an equivalent to C++'s std::partial_sort function available in the .NET libraries? 是否有相当于.NET库中可用的C ++的std::partial_sort函数? How should I go about solving this problem? 我该如何解决这个问题?

EDIT: Here's an example of what I'm going for: 编辑:这是我想要的一个例子:

Let's say I need to get elements 21-40 of a 1000 element set, according to some sorting criteria. 假设我需要根据一些排序标准获得1000个元素集的元素21-40。 In order to speed up the sort, and since I have to go through the whole dataset every time anyway (this is a web service over HTTP, which is stateless), I don't need the whole dataset ordered. 为了加快排序速度,因为我每次都必须遍历整个数据集(这是一个基于HTTP的Web服务,这是无状态的),我不需要整个数据集。 I only need elements 21-40 to be correctly ordered. 我只需要正确排序21-40元素。 It is sufficient to create 3 partitions: Elements 1-20, unsorted (but all less than element 21); 创建3个分区就足够了:元素1-20, 未排序 (但都小于元素21); elements 21-40, sorted ; 元素21-40, 排序 ; and elements 41-1000, unsorted (but all greater than element 40). 和元素41-1000, 未排序 (但都大于元素40)。

OK. 好。 Here's what I would try based on what you said in reply to my comment. 以下是我根据您在回复我的评论时所说的内容。

I want to be able to say "4th through 6th" and get something like: 3, 2, 1 (unsorted, but all less than proper 4th element); 我想能够说“第4到第6”并得到类似的东西:3,2,1(未分类,但都不到第4个元素); 4, 5, 6 (sorted and in the same place they would be for a sorted list); 4,5,6(排序并在同一个地方,它们将用于排序列表); 8, 7, 9 (unsorted, but all greater than proper 6th element). 8,7,9(未分类,但都大于正确的第6个元素)。

Lets add 10 to our list to make it easier: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. 让我们在列表中添加10以使其更容易:10,9,8,7,6,5,4,3,2,1。

So, what you could do is use the quick select algorithm to find the the i th and k th elements. 所以,你可以做的是使用快速选择算法找到 i k 元素。 In your case above i is 4 and k is 6. That will of course return the values 4 and 6. That's going to take two passes through your list. 在你的情况下,我是4,k是6.那当然会返回值4和6.这将在你的列表中进行两次传递。 So, so far the runtime is O(2n) = O(n). 所以,到目前为止,运行时间是O(2n)= O(n)。 The next part is easy, of course. 当然,下一部分很简单。 We have lower and upper bounds on the data we care about. 我们关注的数据有上下限。 All we need to do is make another pass through our list looking for any element that is between our upper and lower bounds. 我们需要做的就是再次通过我们的列表来查找我们的上限和下限之间的任何元素。 If we find such an element we throw it into a new List. 如果我们找到这样一个元素,我们将它扔进一个新的List。 Finally, we then sort our List which contains only the i th through k th elements that we care about. 最后,我们对List进行排序,其中仅包含我们关心的 i k 元素。

So, I believe the total runtime ends up being O(N) + O((ki)lg(ki)) 所以,我相信总运行时最终为O(N)+ O((ki)lg(ki))

static void Main(string[] args) {
    //create an array of 10 million items that are randomly ordered
    var list = Enumerable.Range(1, 10000000).OrderBy(x => Guid.NewGuid()).ToList();

    var sw = Stopwatch.StartNew();
    var slowOrder = list.OrderBy(x => x).Skip(10).Take(10).ToList();
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~8 seconds on my machine

    sw.Restart();
    var smallVal = Quickselect(list, 11);
    var largeVal = Quickselect(list, 20);
    var elements = list.Where(el => el >= smallVal && el <= largeVal).OrderBy(el => el);
    Console.WriteLine(sw.ElapsedMilliseconds);
    //Took ~1 second on my machine
}

public static T Quickselect<T>(IList<T> list , int k) where T : IComparable {
    Random rand = new Random();
    int r = rand.Next(0, list.Count);
    T pivot = list[r];
    List<T> smaller = new List<T>();
    List<T> larger = new List<T>();
    foreach (T element in list) {
        var comparison = element.CompareTo(pivot);
        if (comparison == -1) {
            smaller.Add(element);
        }
        else if (comparison == 1) {
            larger.Add(element);
        }
    }

    if (k <= smaller.Count) {
        return Quickselect(smaller, k);
    }
    else if (k > list.Count - larger.Count) {
        return Quickselect(larger, k - (list.Count - larger.Count));
    }
    else {
        return pivot;
    }
}

您可以使用List <T> .Sort(int,int,IComparer <T>)

inputList.Sort(startIndex, count, Comparer<T>.Default);

Array.Sort() has an overload that accepts index and length arguments that lets you sort a subset of an array. Array.Sort()有一个重载,它接受indexlength参数,允许您对数组的子集进行排序。 The same exists for List . List也存在相同的情况。

You cannot sort an IEnumerable directly, of course. 当然,你无法直接对IEnumerable排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM