简体   繁体   中英

Custom option to Search a Sorted list faster than Plain Binary Search

Following is the use-case :

  • Sorted List of DateTime type, with granularity in the millisecond
  • Search for nearest DateTime , which satisfy the supplied predicate delegate
  • Performance is an issue, since List has 100K+ records, total time span of 10 hours from minimum to maximum index and lot of frequent calls (50+ / run), impacts performance

What we currently do, custom binary search as follows ?

 public static int BinaryLastOrDefault<T>(this IList<T> list, Predicate<T> predicate)
 {
            var lower = 0;
            var upper = list.Count - 1;

            while (lower < upper)
            {
                var mid = lower + ((upper - lower + 1) / 2);
                if (predicate(list[mid]))
                {
                    lower = mid;
                }
                else
                {
                    upper = mid - 1;
                }
            }

            if (lower >= list.Count) return -1;
            return !predicate(list[lower]) ? -1 : lower;
}

Can I use Dictionary to make it O(1) ?

  • My understanding is No, since the input value may not be there and in that case we need to return the closest value, which if in above code returns -1, then last element in the sorted list is the expected result

Following is the option I am considering

  • Data structure like Dictionary<int,SortedDictionary<DateTime,int>>
  • Total duration DateTime duration between highest and lowest value is 10 hours ~ 10 * 3600 * 1000 ms = 36 million ms
  • Created buckets of 60 sec each, total number of elements ~ 36 million / 60 K = 600
  • For any supplied DateTime value, its now easy to find the Bucket, where limited number of values can be stored as SortedDictionary , with key as DateTime value and original index as value, thus if required then data can enumerated to find the closest index

In my understanding this implementation, will make the search much faster than Binary search detailed above, since data searched would be substantially reduced, Any suggestion what more can be done to improve the search time further to further improve it in the algorithmic terms, I can try the Parallel options for various independent calls separately

I made some performance tests using the native BinarySearch method of List<T> . The logic for finding the nearest DateTime is shown below:

public static DateTime GetNearest(List<DateTime> source, DateTime date)
{
    var index = source.BinarySearch(date);
    if (index >= 0) return source[index];
    index = ~index;
    if (index == 0) return source[0];
    if (index == source.Count) return source[source.Count - 1];
    var d1 = source[index - 1];
    var d2 = source[index];
    return (date - d1 < d2 - date) ? d1 : d2;
}

I created a random list of 1,000,000 sorted dates, covering a time span of 10 hours from min to max. Then I created an equally sized list with unsorted random dates to search, covering a slightly larger time span. Then changed the build to Release and started the test. The result demonstrated that it is possible to make more than 800,000 searches in less than a second, using only a single core of a relatively slow machine.

Then I increased the complexity of the test by searching in a List<(DateTime, object)> containing 1,000,000 elements, so that each comparison needs two extra calls to a dateSelector function, which returns the DateTime property of each ValueTuple . The result: 350,000 searches per thread per second.

I increased the complexity even further by using reference types as elements, populating a List<Tuple<DateTime, object>> with 1,000,000 tuples. The performance was still pretty decent: 270,000 searches per thread per second.

My conclusion is that the BinarySearch method is lightning fast, and it would be surprising if it was found to be the bottleneck of an application.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM