简体   繁体   English

二进制搜索与最近一次匹配最近的匹配项

[英]binary search nearest match with last occurrence

I am implementing effective algorithm to search last occurrence of ( the key or nearest match (upper bound)). 我正在实施有效的算法来搜索(键或最近匹配项(上限))的最后一次出现。

So far, i got this. 到目前为止,我明白了。

long bin_search_closest_match_last_occurance ( long  * lArray, long sizeArray, long lnumber)
{
    long left, right, mid, last_occur;

    left = 0;
    right = sizeArray - 1;
    last_occur = -1;

    while ( left <= right )
    {
        mid = ( left + right ) / 2;

        if ( lArray[mid] == lnumber  )
        {
            last_occur = mid;
            left = mid +1;
        }

        if ( lArray[mid] > lnumber ) 
            right = mid - 1;
        else 
            left = mid + 1;
    }
    return last_occur!=-1?last_occur:mid;
}

Let's have an array {0,0,1,5,9,9,9,9} and the key is 6 Fce should return index 7 , but my fce returns 4 让我们有一个数组{0,0,1,5,9,9,9,9} ,键为6 Fce应该返回索引7 ,但是我的fce返回4

Please note, that i do not want to iterate linearly to the last matching index. 请注意,我不想线性地迭代到最后一个匹配的索引。

In mind i have solution where i change parameters fce(add start,end indexes) and do another binary search withing fce from found upper bound to the end of the array (Only if i dont find exact match, last_occur==-1 ). 在心中我有解决方案,其中我更改参数fce(添加开始,结束索引),并使用fce从找到的上限到数组末尾进行另一个二进制搜索(仅当我找不到完全匹配的条件时, last_occur==-1 )。

I want to ask if there's better/cleaner solution to implement it? 我想问一下是否有更好/更清洁的解决方案来实施它?

nm's 2-search approach will work, and it keeps the optimal time complexity, but it's likely to increase the constant factor by around 2, or by around 1.5 if you begin the second search from where the first search ended. nm的2次搜索方法可以工作,并且保持最佳的时间复杂度,但是如果您从第一次搜索结束的地方开始第二次搜索,则可能会将恒定因子增加大约2,或者增加大约1.5。

If instead you take an "ordinary" binary search that finds the first instance of lnumber (or, if it doesn't exist, a lower bound), and change it so that the algorithm logically "reverses" the array by changing every array access lArray[x] to lArray[sizeArray - 1 - x] (for any expression x ), and also "reverse" the ordering by changing the > lnumber test to < lnumber , then only a single binary search is needed . 如果改为使用“普通”二进制搜索来查找lnumber第一个实例(或者,如果不存在,则为下限),然后对其进行更改,以便算法通过更改每次数组访问来逻辑上“反转”该数组lArray[x]lArray[sizeArray - 1 - x] (对于任何表达式x ),也可以通过将> lnumber测试更改为< lnumber来“逆转”顺序,然后只需要一个二进制搜索即可 The only array accesses this algorithm actually performs are two lookups to lArray[mid] , which an optimising compiler is very likely to evaluate only once if it can prove that nothing will change the value in between the accesses (this might require adding restrict to the declaration of long * lArray ; alternatively, you could just load the element into a local variable and test it twice instead). 唯一的数组访问这个算法实际执行两种查找到lArray[mid]其优化编译器很可能只计算一次,如果能证明,没有什么会改变访问之间的值(这可能需要增加restrictlong * lArray声明;或者,您可以只将元素加载到局部变量中,然后对其进行两次测试)。 Either way, if only a single array lookup per iteration is needed, then changing the index from mid to sizeArray - 1 - mid will add just 2 extra subtractions per iteration (or just 1 if you --sizeArray before entering the loop), which I expect will not increase the constant nearly as much as nm's approach. 无论哪种方式,如果每次迭代仅需要单个数组查找,则将索引从mid更改为sizeArray - 1 - mid将为每次迭代仅添加2个额外的减法(如果进入循环, --sizeArray仅增加1),这我希望增加的常数不会像nm方法那样大。 Of course, as with anything, if performance is critical then test it; 当然,与其他任何事物一样,如果性能至关重要,请对其进行测试。 and if it's not, then don't worry too much about saving microseconds. 如果不是这样,则不必太担心节省微秒。

You will also need to "reverse" the return value too: 您还需要“反转”返回值:

return last_occur!=-1?last_occur:sizeArray - 1 - mid;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM