简体   繁体   中英

Fastest way to count number of distinct elements in array

I have a class of object like this:

public class Individual
{
    public double[] Number { get; set; } = new double[2]{ 0.0, 0.0 };
}

I stock these class in a list of dictionary and give values for Individual.Number:

selection = List<Dictionary<int, Individual>>

Now, I have to count the number of distinct values of Individual.Number (in the whole list). What I've done so far is:

selection.Values.SelectMany(list => list.Number).Distinct().Count();

I wonder if this is the fastest way to count? How can I improve the performance?

Thanks,

Internally the Distinct() method creates a new Set<T> without specifiying the size.

If you have a vague idea of the number of elements, this can prevent a number of allocations (and memory moves).

And since you only want the Count() You can include that directly (Credits @TimSchmelter).

    public static int OptimizedDistinctAndCount<TSource>(this IEnumerable<TSource> source, int numberOfElements) {
        if (source == null) throw Error.ArgumentNull("source");
        var set = new HashSet<TSource>(numberOfElements);
        foreach (TSource element in source) {
           set.Add(element);
        }
        return set.Count;
    }

You could then use:

selection.Values.SelectMany(list => list.Number).OptimizedDistinctAndCount(123);

What do you think about this?

public class Individual
{
  public double[] Numbers { get; set; }
  public Individual()
  {
    Numbers = new double[0];
  }
  public Individual(double[] values)
  {
    Numbers = values/*.ToArray() if a copy must be done*/;
  }
}

class Program
{
  static void Main()
  {
    // Populate data
    var selection = new List<Dictionary<int, Individual>>();
    var dico1 = new Dictionary<int, Individual>();
    var dico2 = new Dictionary<int, Individual>();
    selection.Add(dico1);
    selection.Add(dico2);
    dico1.Add(1, new Individual(new double[] { 1.2, 1.3, 4.0, 10, 40 }));
    dico1.Add(2, new Individual(new double[] { 1.2, 1.5, 4.0, 20, 40 }));
    dico2.Add(3, new Individual(new double[] { 1.7, 1.6, 5.0, 30, 60 }));
    // Count distinct
    var found = new List<double>();
    foreach ( var dico in selection )
      foreach ( var item in dico )
        foreach ( var value in item.Value.Numbers )
          if ( !found.Contains(value) )
            found.Add(value);
    // Must show 12
    Console.WriteLine("Distinct values of the data pool = " + found.Count);
    Console.ReadKey();
  }
}

This approach eliminates some calling methods timings.

A further optimization would use for loops instead of foreach, and perhaps using a chained list instead of List (faster but requires more memory).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM