有效地查找已排序数组中的整数量

Question

I'm studying for a test, and found this question. 我正在学习考试，并发现了这个问题。

You are given a sorted array of integers for example: 您将获得一个排序的整数数组，例如：

{-5, -5, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99}

Write a method: 写一个方法：

Public static int count (int[] a, int x)

that returns the amount of times, number 'x' is in the array. 返回数字'x'在数组中的次数。

for example: 例如：

x = -5, it returns 2
x = 2, it returns 5
x = 8, it returns 0

I need to write it as efficient as possible, please don't give me the answer (or write it if you wish, but I won't look), my idea is to do a binary search, and then go both edges (backward and forward) of the value that I find and with the index numbers, return the correct answer, my questions are: 我需要尽可能高效地写它，请不要给我答案（或者如果你愿意的话写下来，但我不会看），我的想法是做二分搜索，然后去两边（向后）我找到的值和索引号，返回正确的答案，我的问题是：

is it the most efficient way? 这是最有效的方式吗？
won't it be O(n) in the worst case? 在最坏的情况下，它不会是O（n）吗？ (when the array is filled with a single number) - （当数组填充一个数字时） -

If so - then why should I do a binary search? 如果是这样 - 那我为什么要进行二分搜索呢？

Answer 1

Modify your binary search to find the first and last occurrence of the given input, then the difference between those two indexes is the result. 修改二进制搜索以查找给定输入的第一个和最后一个匹配项，然后这两个索引之间的差异就是结果。

To find a first and last occurrence using binary search you need to change a bit from the usual binary search algorithm. 要使用二进制搜索查找第一个和最后一个匹配项，您需要更改通常的二进制搜索算法中的位。 In binary search the value is returned when a match is found. 在二进制搜索中，找到匹配项时返回值。 But here unlike the usual binary search you need to continue searching until you find a mismatch. 但是，与通常的二进制搜索不同，您需要继续搜索，直到找到不匹配。

helpful links 有用的网址

finding last occurence , finding first occurance 找到最后一次出现，找到第一次出现

A bit update 有点更新

after you find the first occurrence then you can use that index as a starting point of the next binary search to find the last. 在找到第一个匹配项后，您可以使用该索引作为下一个二进制搜索的起点来查找最后一个。

Answer 2

Two solutions come to mind: 我想到了两种解决方案：

1) Do Binary Search alright, but maintain that invariant that it finds the first occurence. 1）二元搜索是否正常，但保持不变，它发现第一次出现。 Then do a linear search. 然后进行线性搜索。 This will be Theta(log n + C) where C is the count. 这将是Theta（log n + C），其中C是计数。

Programming Pearls by Jon Bentley has a nice write up, where he mentions looking for the first occurence is actually more efficient than looking for any occurence. 由Jon Bentley编写的Pearls有一个很好的写作，他提到寻找第一次出现实际上比寻找任何出现更有效。

2) You can also do two binary searches, one for the first occurence, and one for the last, and take the difference of the indices. 2）你也可以做两个二进制搜索，一个用于第一次出现，一个用于最后一次，并取得索引的差异。 This will be Theta(log n). 这将是Theta（log n）。

You can decide which solution to apply based on the expected value of C. If C = o(log n) (yes small o), then looking for the first occurence and doing linear search will be better. 您可以根据C的预期值来决定应用哪种解决方案。如果C = o（log n）（是小o），那么寻找第一次出现并进行线性搜索会更好。 Otherwise do two binary searches. 否则做两个二进制搜索。

If you don't know the expected value of C, then you might be better off with solution 2. 如果您不知道C的预期值，那么使用解决方案2可能会更好。

Answer 3

Do binary Search to find the first occurence. 二元搜索找到第一次出现。 Do binary search to find the last occurence. 进行二分查找以找到最后一次出现。 The number of occurences is equal to the number of numbers between the 2 indices found. 出现次数等于找到的2个索引之间的数字数。

Working code: 工作代码：

public class Main {
    public static void main(String[] args){
        int[] arr = {-5, -5, 1, 1, 1, 1, 1, 1, 
                                    1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99};
        int lo = getFirst(arr, -5);
        if(lo==arr.length){ // the number is not present in the array.
            System.out.println(0);
        }else{
            int hi = getLast(arr, -5);
            System.out.println((hi-lo+1));
        }
    }

    // Returns last occurence of num or arr.length if it does not exists in arr.
    static int getLast(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                lo = mid+1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }

    // Returns first occurence of num or arr.length if it does not exists in arr.
    static int getFirst(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                hi = mid-1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }
}

Answer 4

Actually there is a slightly better solution than the given solutions! 实际上，有一个比给定的解决方案更好的解决方案！ It is a combination of two different ways to do binary search. 它是两种不同方式进行二分查找的组合。

First you do a binary search to get the first occurence. 首先，您进行二分搜索以获得第一次出现。 This is O(log n) 这是O（log n）

Now, starting with the first index you just found, you do a different kind of binary search: You guess the frequency of that element F, by starting with a guess of F=1 and doubling your estimate and check if the element repeats. 现在，从您刚刚找到的第一个索引开始，您可以进行不同类型的二分搜索：您猜测该元素F的频率，首先猜测F = 1并将估计值加倍并检查元素是否重复。 This is guaranteed to be O(log F) (where F is the frequency). 这保证是O（log F）（其中F是频率）。

This way, the algorithm runs in O(log N + log F) 这样，算法运行在O（log N + log F）

You do not have to worry about the distribution of the numbers! 您不必担心数字的分布！

Answer 5

IMHO this is the most efficient solution: Others may have mentioned a similiar approach, but I think this one is the easiest to explain and the easiest to understand, also it has a modification that will speed up the process in practice: 恕我直言这是最有效的解决方案：其他人可能已经提到了类似的方法，但我认为这是最容易解释和最容易理解的方法，它也有一个修改，将加快实践过程：

Basically the idea is the find the smallest and largest index of the occurence. 基本上这个想法是找到最小和最大的出现指数。 Finding the smallest is O(log N) using binary search (using Newton's method to actually increase the performance in the average case is a possible improvement). 使用二进制搜索找到最小的O（log N）（使用牛顿方法实际上在平均情况下提高性能是可能的改进）。 If you don't know how to modify binary search to find the smallest index, the trivial modification is to look for the element with value (p - 0.5) - obviously you will not find that value in the integer array, but if the binary search terminates the index will be the one next to where the recursion stops. 如果您不知道如何修改二进制搜索以找到最小的索引，那么简单的修改是查找具有值的元素(p - 0.5) - 显然您不会在整数数组中找到该值，但如果是二进制搜索终止索引将是递归停止的旁边的索引。 You will just need to check whether it exists at all. 您只需要检查它是否存在。 This will give you the smallest index. 这将为您提供最小的索引。

Now in order to find the largest index, again you will have to start a binary search, this time using the smallest index as the lower bound and (p + 0.5) as the search target, this is guaranteed to be O(log N), in the average case it will be O(log N/2). 现在为了找到最大的索引，你将不得不开始二进制搜索，这次使用最小索引作为下限 ， (p + 0.5)作为搜索目标，这保证是O（log N），在一般情况下，它将是O（log N / 2）。 Using Newton's method and taking the values of the upper and lower bounds into account will in practice improve the performance. 使用牛顿方法并考虑上限和下限的值将在实践中改善性能。

Once you have found the largest and smallest index, the difference between them obviously is the result. 一旦找到最大和最小的索引，它们之间的差异显然就是结果。

For evenly distributed numbers, using the Newton modification will drastically improve the runtime (in the case of a continuous equidistant sequence of numbers it will be O(1) (two or three steps) to find the smallest and greatest values), although the theoretical complexity still is O(log N) for arbitrary input. 对于均匀分布的数字，使用牛顿修改将大大改善运行时间（在连续等距数字序列的情况下，将找到最小和最大值的O（1）（两个或三个步骤）），尽管理论上对于任意输入，复杂性仍然是O（log N）。

有效地查找已排序数组中的整数量

问题描述

5 个解决方案

解决方案1
15 2014-04-02 14:37:31

解决方案2
5 2014-04-02 14:35:33

解决方案3
5 2014-04-02 14:38:00

解决方案4
1 2014-04-02 19:18:22

解决方案5
0 2014-04-03 06:46:09

有效地查找已排序数组中的整数量

问题描述

5 个解决方案

解决方案1 15 2014-04-02 14:37:31

解决方案2 5 2014-04-02 14:35:33

解决方案3 5 2014-04-02 14:38:00

解决方案4 1 2014-04-02 19:18:22

解决方案5 0 2014-04-03 06:46:09

解决方案1
15 2014-04-02 14:37:31

解决方案2
5 2014-04-02 14:35:33

解决方案3
5 2014-04-02 14:38:00

解决方案4
1 2014-04-02 19:18:22

解决方案5
0 2014-04-03 06:46:09