简体   繁体   中英

What is the most efficient (fastest) way to find an N number of the largest integers in an array in C?

Let's have an array of size 8 Let's have N be 3

With an array: 1 3 2 17 19 23 0 2

Our output should be: 23, 19, 17

Explanation: The three largest numbers from the array, listed in descending order.

I have tried this:

int array[8];
int largest[N] = {0, 0, 0};

for (int i = 1; i < N; i++) {
    for (int j = 0; j < SIZE_OF_ARRAY; j++) {
        if (largest[i] > array[j]) {
            largest[i] = array[j];
            array[j] = 0;
        }
    }
}

Additionally, let the constraint be as such:

integers in the array should be 0 <= i <= 1 000

N should be 1 <= N <= SIZE_OF_ARRAY - 1

SIZE_OF_ARRAY should be 2 <= SIZE_OF_ARRAY <= 1 000 000

My way of implementing it is very inefficient, as it scrubs the entire array an N number of times. With huge arrays, this can take several minutes to do.

What would be the fastest and most efficient way to implement this in C?

You should look at the histogram algorithm. Since the values have to be between 0 and 1000, you just allocate an array for each of those values:

#define MAX_VALUE 1000
int occurrences[MAX_VALUE+1];
int largest[N];
int i, j;

for (i=0; i<N; i++)
    largest[N] = -1;

for (i=0; i<=MAX_VALUE; i++)
    occurrences[i] = 0;

for (i=0; i<SIZE_OF_ARRAY; i++)
    occurrences[array[i]]++;

// Step through the occurrences array backward to find the N largest values.
for (i=MAX_VALUE, j=0, i; i>=0 && j<N; i--)
    if (occurrences[i] > 0)
        largest[j++] = i;

Note that this will yield only one element in largest for each unique value. Modify the insertion accordingly if you want all occurrences to appear in largest . Because of that, you may get values of -1 for some elements if there weren't enough unique large numbers to fill the largest array. Finally, the results in largest will be sorted from largest to smallest. That will be easy to fix if you want to: just fill the largest array from right to left.

One method I can think of is to just sort the array and return the first N numbers. Since the array is sorted, the N number we return will be the N largest numbers of the array. This method will take a time complexity of O(nlogn) where n is the number of elements we have in the given array. I think this is probably very good time complexity you can get when approaching this problem.

Another approach with similar time complexity would be to use a max-heap . Form max-heap from the given array and for N times, use pop() (or extract or whatever you call it) to get the top-most element which would be the max element remaining in the heap after each pop .

The time complexity of this approach could be considered to be even better than first one - O(n + Nlogn) where n is the number of elements in array and N is the number of largest elements to be found. Here, O(n) would be required to build heap and for popping the top-most element, we would need O(logn) for N times which sums up to - O(n + Nlogn) , slightly better than O(nlogn)

The fastest way is to recognize that data doesn't just appear (it either exists at compile time; or arrives by IO - from files, from network, etc); and therefore you can find the 3 highest values when the data is created (at compile time; or when you're parsing and sanity checking and then storing data received by IO - from files, from network, etc). This is likely to be the fastest possible way (because you're either doing nothing at run-time, or avoiding the need to look at all the data a second time).

However; in this case, if the data is modified after it was created then you'd need to update the "3 highest values" at the same time as the data is modified; which is easy if a lower value is replaced by a higher value (you just check if the new value becomes one of the 3 highest values) but involves a search if a "previously highest" value is being replaced with a lower value.

If you need to search; then it can be done with a single loop, like:

    firstHighest = INT_MIN;
    secondHighest = INT_MIN;
    thirdHighest = INT_MIN;

    for (int i = 1; i < N; i++) {
        if(array[i] > thirdHighest) {
            if(array[i] > secondHighest) {
                if(array[i] > firstHighest) {
                    thirdHighest = secondHighest;
                    secondHighest = firstHighest;
                    firstHighest = array[i];
                } else {
                    thirdHighest = secondHighest;
                    secondHighest = array[i];
                }
            } else {
                thirdHighest = array[i];
            }
        }
    }

Note: The exact code will depend on what you want to do with duplicates (you may need to replace if(array[j] > secondHighest) { with if(array[j] >= secondHighest) { and if(array[j] > firstHighest) { with if(array[j] >= firstHighest) { if you want the numbers 1, 2, 3, 4, 4, 4, 4 to give the answer 4, 4, 4 instead of 2, 3, 4).

For large amounts of data it can be accelerated with SIMD and/or multiple threads. For example; if SIMD can do "bundles of 8 integers" and you have 4 CPUs (and 4 threads); then you can split it into quarters then treat each quarter as columns of 8 elements; find the highest 3 values in each column in each quarter; then determine the highest 3 values from the "highest 3 values in each column in each quarter". In this case you will probably want to add padding (dummy values set to INT_MIN ) to the end of the array to ensure that the array's total size is a multiple of SIMD width and number of CPUs.

For small amounts of data the extra overhead of setting up SIMD and/or coordinating multiple threads is going to cost more than it saves; and the "simple loop" version is likely to be as fast as it gets.

For unknown/variable amounts of data you could provide multiple alternatives (simple loop, SIMD with single thread, and SIMD with a variable number of threads) and decide which method to use (and how many threads to use) at run-time based on the amount of data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM