简体   繁体   English

如何在C#中长时间调用.Distinct()报告进度

[英]How to report progress on a long call to .Distinct() in C#

I have an array of custom objects named AnalysisResult . 我有一个名为AnalysisResult的自定义对象数组。 The array can contain hundreds of thousands of objects; 该数组可以包含数十万个对象; and, occasionally I need only the Distinct() elements of that array. 并且,偶尔我只需要该数组的Distinct()元素。 So, I wrote a item comparer class called AnalysisResultDistinctItemComparer and do my call like this: 所以,我编写了一个名为AnalysisResultDistinctItemComparer的项比较器类,并按我这样的方式调用:

public static AnalysisResult[] GetDistinct(AnalysisResult[] results)
{
    return results.Distinct(new AnalysisResultDistinctItemComparer()).ToArray();
}

My problem here is that this call can take a LONG time (on the order of minutes) when the array is particularly big (greater than 200,000 objects). 我的问题是,当数组特别大(大于200,000个对象)时,此调用可能需要很长时间(大约几分钟)。

I currently call that method in a background worker and display a spinning gif to alert the user that the method is being performed and that the application has not frozen. 我目前在后台工作程序中调用该方法并显示一个旋转gif,以提醒用户该方法正在执行,并且应用程序尚未冻结。 This is all fine and well but it does not give the user any indication of the current progress. 这一切都很好,但它没有给用户任何当前进展的指示。

I really need to be able to indicate to the user the current progress of this action; 我真的需要能够向用户指出此动作的当前进度; but, I have been unable to come up with a good approach. 但是,我一直无法想出一个好方法。 I was playing with doing something like this: 我正在玩这样的事情:

public static AnalysisResult[] GetDistinct(AnalysisResult[] results)
{
    var query = results.Distinct(new AnalysisResultDistinctItemComparer());

    List<AnalysisResult> retVal = new List<AnalysisResult>();
    foreach(AnalysisResult ar in query)
    {
        // Show progress here
        retVal.Add(ar);
    }

    return retVal.ToArray();
}

But the problem is that I have no way of knowing what my actual progress is. 但问题是我无法知道我的实际进展是什么。 Thoughts? 思考? Suggestions? 建议?

Don't call ToArray() at the end of your method, just use yield return . 不要在方法结束时调用ToArray() ,只需使用yield return So do this: 这样做:

public static IEnumerable<AnalysisResult> Distinct(AnalysisResult[] results)
{
    var query = results.Distinct(new AnalysisResultDistinctItemComparer());

    foreach(AnalysisResult ar in query)
    {
        // Use yield return here, so that the iteration remains lazy.
        yield return ar;
    }
}

Basically, yield return does some compiler magic to ensure that the iteration remains lazy, so you don't have to wait for a complete new collection to be created before returning to the caller. 基本上, yield return会使编译器变得神奇,以确保迭代保持惰性,因此您无需等待在返回调用者之前创建完整的新集合。 Instead, as each item is calculated, you return that item immediately to the consumer (which can then perform the updating logic -- per-item if necessary). 相反,当计算每个项目时,您立即将该项目返回给使用者(然后可以根据需要执行更新逻辑 - 每个项目)。 You can use the same technique in your GetDistinct method as well. 您也可以在GetDistinct方法中使用相同的技术。

Jon Skeet has an implementation that looks like this ( LINQ's Distinct() on a particular property ): Jon Skeet有一个看起来像这样的实现( LINQ在特定属性上的Distinct() ):

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

Notice here that he uses a HashSet , which is built to disallow duplicates. 请注意,他使用的是HashSet ,它是为了禁止重复而构建的。 Simply check to see if the item has already been added, and if not, then return it. 只需检查项目是否已添加,如果没有,则返回。

That all said, remember that this is an Algorithms-and-Data Structures type question. 尽管如此,请记住这是一个算法和数据结构类型的问题。 It would be much easier to do something like this: 做这样的事情会容易得多:

Dictionary<Key, Value> distinctItems = new Dictionary<Key, Value>(); 

foreach (var item in nonDistinctSetOfItems) {
    if (distinctItems.ConatainsKey(item.KeyProperty) == false) {
        distinctItems.Add(item.KeyProperty, item);
    }
}

... = distinctItems.Values // This would contain only the distinct items.

That is, a Symbol Table/ Dictionary is built for just this sort of problem - associating entries with unique keys. 也就是说,符号表/ Dictionary是为这类问题而构建的 - 将条目与唯一键相关联。 If you keep your data stored this way, it greatly simplifies the problem. 如果以这种方式存储数据,则可以极大地简化问题。 Don't overlook the simple solution! 不要忽视简单的解决方案!

Given the design of that Distinct method, you're iterating over the entire collection every time you call Distinct. 鉴于Distinct方法的设计,每次调用Distinct时都会迭代整个集合。 Have you considered writing a custom collection that adds to an index each time you add an object to the array? 您是否考虑过编写每次向数组添加对象时添加到索引的自定义集合?

On the other hand you may use ThreadPool and WaitHandle to run your "Distinct" and "DisplayProgress" business with multiple threads. 另一方面,您可以使用ThreadPool和WaitHandle来运行具有多个线程的“Distinct”和“DisplayProgress”业务。

public class Sample
{
    public void Run()
    {
        var state = new State();
        ThreadPool.QueueUserWorkItem(DoWork, state);
        ThreadPool.QueueUserWorkItem(ShowProgress, state);
        WaitHandle.WaitAll(new WaitHandle[] {state.AutoResetEvent});
        Console.WriteLine("Completed");
    }

    public void DoWork(object state)
    {
        //do your work here
        for (int i = 0; i < 10; i++)
        {
            ((State) state).Status++;
            Thread.Sleep(1000);
        }

        ((State) state).AutoResetEvent.Set();
    }

    public void ShowProgress(object state)
    {
        var s = (State) state;
        while (!s.IsCompleted())
        {

            if (s.PrintedStatus != s.Status)
                Console.WriteLine(s.Status);
            s.PrintedStatus = s.Status;
        }
    }

    public class State
    {
        public State()
        {
            AutoResetEvent = new AutoResetEvent(false);
        }

        public AutoResetEvent AutoResetEvent { get; private set; }
        public int Status { get; set; }
        public int PrintedStatus { get; set; }
        private bool _completed;
        public bool IsCompleted()
        {
            return _completed;
        }
        public void Completed()
        {
            _completed = true;
            AutoResetEvent.Set();
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM