简体   繁体   English

计算列表框中的重复项

[英]Counting duplicates in a Listbox

I am trying to develop a simple application in C# to count the number of duplicates in a listbox. 我正在尝试使用C#开发一个简单的应用程序,以计算列表框中重复项的数量。 I need to count all the number of duplicates and display a rank suffix to the top 3 elements most duplicated. 我需要计算所有重复项的数量,并显示最重复的前3个元素的排名后缀。 For example, suppose a list has 7 elements called 'apple', 6 elements called 'pear', 4 elements called 'peach' and 3 elements called 'orange', after the process, it should display the list as: 例如,假设一个列表有7个元素,称为“ apple”,6个元素为“ pear”,4个元素为“ peach”,3个元素为“ orange”,在处理之后,该列表应显示为:

apple (7)
pear (6)
peach (4)
orange

Since we do not know the data source you are using, here is a generic LINQ example that could get you started. 由于我们不知道您正在使用的数据源,因此下面是一个通用的LINQ示例,可以帮助您入门。

string[] items = { "apple", "pear", "peach", "apple", "orange", "peach", "apple" };

var ranking = (from item in items
               group item by item into r
               orderby r.Count() descending
               select new { name = r.Key, rank = r.Count() }).Take(3);

This will return a collection of objects containing the name and rank of the top 3 items. 这将返回一个对象集合,其中包含前3个项目的name and rank

Of course you would replace the items array here with what every data source you are using to fill the ListBox, and if the items are not just simple strings but more complex items you would adjust the LINQ query appropiately. 当然,您可以将这里的items数组替换为用于填充ListBox的每个数据源,并且,如果这些项不仅仅是简单的字符串,而是更复杂的项,则可以适当地调整LINQ查询。

Here is an example of the above which will fill a listbox with the data as in the form you showed. 这是上面的示例,它将按照您显示的形式用数据填充列表框。

  string[] items = { "apple", "pear", "peach", "apple", "orange", "peach", "apple" };

  var ranking = (from item in items
                 group item by item into r
                 orderby r.Count() descending
                 select new { name = r.Key, rank = r.Count() }).ToArray();

  for (int i = 0; i < ranking.Length; ++i)
  {
    var item = ranking[i];
    if (i < 3)
    {
      listBox1.Items.Add(string.Format("{0} ({1})", item.name, item.rank));
    }
    else
    {
      listBox1.Items.Add(item.name);
    }
  }

This does the same as the first example, but the transforms the results to an array and populates a listbox with the items with the first 3 items showing there rank. 此操作与第一个示例相同,但是将结果转换为数组,并使用列表中的前三个项目填充项目的列表框。

Here is an alternative method to using Linq, presented as a timed test to see which performs faster. 这是使用Linq的另一种方法,以定时测试的形式提供,以查看哪种方法执行更快。 These are the results I obtained with 1000 iterations: 这些是我通过1000次迭代获得的结果:

Total words: 1324
Min        Max        Mean       Method
5305       22889      5739.182   LinkMethodToArray
5053       11973      5418.355   LinkMethod
3112       6726       3252.457   HashMethod

The LinkMethod is only about 1.6 times slower in this case. 在这种情况下,LinkMethod仅慢1.6倍。 Not as bad as a lot of Linq code that I have performance tested, but it was only 1324 words. 并没有我经过性能测试的很多Linq代码那么糟糕,但是只有1324个单词。

Edit #1 编辑#1

That was before adding the sort. 那是在添加排序之前。 With the sort, you can see that it is comparible with the Linq method. 通过排序,您可以看到它与Linq方法具有可比性。 Of course, copying the hash to a list and then sorting the list isn't the most efficient way to do this. 当然,将哈希复制到列表然后对列表进行排序并不是实现此目的的最有效方法。 We could improve on this. 我们可以对此进行改进。 There are a couple of ways that come to mind, but none of them are simple and would require writing a lot of custom code. 我想到了几种方法,但是它们都不是简单的,并且需要编写许多自定义代码。

Since we want to use what's already available and we want the code to be clear, I have to say that Linq is in fact a very good choice. 由于我们要使用已有的功能,并且希望代码清晰,因此我不得不说,Linq实际上是一个非常好的选择。 This has changed my opinion of Linq.. a little. 这一点改变了我对Linq的看法。 I've seen far too many other comparisons where Linq ends up disastrously slower (on the order of 1,000s of times slower) to give a green light to using Linq anywhere and everywhere, but certainly in this one place it shines very well. 我看到了太多其他比较,Linq的运行速度大大降低(降低了1000倍),这为在任何地方和任何地方使用Linq都开了绿灯,但是肯定在这个地方它表现得非常好。

I guess the moral is, as it always has been, test, test, test. 我猜想,道德是一如既往的测试,测试,测试。

Here are the values with the sort added to HashMethod. 这是添加到HashMethod的排序值。

Total words: 1324
Min        Max        Mean       Method
5284       21030      5667.808   LinkMethodToArray
5081       36339      5425.626   LinkMethod
5017       27583      5288.602   HashMethod

Edit #2 编辑#2

A couple of simple optimizations (pre-initializing both the dictionary and the list) make HashMethod a bit noticably faster. 几个简单的优化(预先初始化字典和列表)使HashMethod的速度明显加快。

Total words: 1324
Min        Max        Mean       Method
5287       16299      5686.429   LinkMethodToArray
5081       21813      5440.758   LinkMethod
4588       8420       4710.659   HashMethod

Edit #3 编辑#3

With a larger word set, they become much more even. 使用更大的单词集,它们变得更加均匀。 In fact, the Linq method seems to edge out every time. 实际上,Linq方法似乎每次都会淘汰。 Here is the United States Constitution (All seven articles and signatures). 这是美国宪法(所有七个条款和签名)。 This may be due to the fact that the declaration contains a lot of repeat words ("He has ..."). 这可能是由于声明中包含很多重复的单词(“他有...”)。

Total words: 4545
Min        Max        Mean       Method
13363      36133      14086.875  LinkMethodToArray
12917      26532      13668.914  LinkMethod
13601      19435      13836.955  HashMethod

Code: 码:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Threading;

class Program
{
    static void Main()
    {
        Thread.CurrentThread.Priority = ThreadPriority.Highest;

        // Declaration.txt is a copy of the Declaration of Independence
        // which can be found here: http://en.wikisource.org/wiki/United_States_Declaration_of_Independence
        string declaration = File.ReadAllText("Declaration.txt");
        string[] items = declaration.ToLower().Split(new char[] { ',', '.', ':', ';', '-', '\r', '\n', '\t', ' ' }, StringSplitOptions.RemoveEmptyEntries);

        // pre-execute outside timing loop
        LinqMethodToArray(items);
        LinqMethod(items);
        HashMethod(items);

        int iterations = 1000;
        long min1 = long.MaxValue, max1 = long.MinValue, sum1 = 0;
        long min2 = long.MaxValue, max2 = long.MinValue, sum2 = 0;
        long min3 = long.MaxValue, max3 = long.MinValue, sum3 = 0;

        Console.WriteLine("Iterations: {0}", iterations);
        Console.WriteLine("Total words: {0}", items.Length);

        Stopwatch sw = new Stopwatch();

        for (int n = 0; n < iterations; n++)
        {
            sw.Reset();
            sw.Start();
            LinqMethodToArray(items);
            sw.Stop();
            sum1 += sw.ElapsedTicks;
            if (sw.ElapsedTicks < min1)
                min1 = sw.ElapsedTicks;
            if (sw.ElapsedTicks > max1)
                max1 = sw.ElapsedTicks;

            sw.Reset();
            sw.Start();
            LinqMethod(items);
            sw.Stop();
            sum2 += sw.ElapsedTicks;
            if (sw.ElapsedTicks < min2)
                min2 = sw.ElapsedTicks;
            if (sw.ElapsedTicks > max2)
                max2 = sw.ElapsedTicks;

            sw.Reset();
            sw.Start();
            HashMethod(items);
            sw.Stop();
            sum3 += sw.ElapsedTicks;
            if (sw.ElapsedTicks < min3)
                min3 = sw.ElapsedTicks;
            if (sw.ElapsedTicks > max3)
                max3 = sw.ElapsedTicks;
        }

        Console.WriteLine("{0,-10} {1,-10} {2,-10} Method", "Min", "Max", "Mean");
        Console.WriteLine("{0,-10} {1,-10} {2,-10} LinkMethodToArray", min1, max1, (double)sum1 / iterations);
        Console.WriteLine("{0,-10} {1,-10} {2,-10} LinkMethod", min2, max2, (double)sum2 / iterations);
        Console.WriteLine("{0,-10} {1,-10} {2,-10} HashMethod", min3, max3, (double)sum3 / iterations);
    }

    static void LinqMethodToArray(string[] items)
    {
        var ranking = (from item in items
                       group item by item into r
                       orderby r.Count() descending
                       select new { Name = r.Key, Rank = r.Count() }).ToArray();
        for (int n = 0; n < ranking.Length; n++)
        {
            var item = ranking[n];
            DoSomethingWithItem(item);
        }
    }

    static void LinqMethod(string[] items)
    {
        var ranking = (from item in items
                       group item by item into r
                       orderby r.Count() descending
                       select new { Name = r.Key, Rank = r.Count() });
        foreach (var item in ranking)
            DoSomethingWithItem(item);
    }

    static void HashMethod(string[] items)
    {
        var ranking = new Dictionary<string, int>(items.Length / 2);
        foreach (string item in items)
        {
            if (!ranking.ContainsKey(item))
                ranking[item] = 1;
            else
                ranking[item]++;
        }
        var list = new List<KeyValuePair<string, int>>(ranking);
        list.Sort((a, b) => b.Value.CompareTo(a.Value));
        foreach (KeyValuePair<string, int> pair in list)
            DoSomethingWithItem(pair);

    }

    static volatile object hold;
    static void DoSomethingWithItem(object item)
    {
        // This method exists solely to prevent the compiler from
        // optimizing use of the item away so that this program
        // can be executed in Release build, outside the debugger.
        hold = item;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM