简体   繁体   中英

Find largest mode in huge data set without timing out

Description In statistics, there is a measure of the distribution called the mode. The mode is the data that appears the most in a data set. A data set may have more than one mode, that is, when there is more than one data with the same number of occurrences.

Mr. Dengklek gives you N integers. Find the greatest mode of the numbers.

Input Format The first line contains an integer N. The next line contains N integers.

Output Format A row contains an integer which is the largest mode.

Input Example 6 1 3 2 4 1 4

Example Output 4

Limits 1 ≤ N ≤100,000

1≤(every integer on the second line)≤1000

#include <iostream>
#include <string>

using namespace std;

#define ll long long

int main() {

    unsigned int N;
    while(true){
        cin >> N;
        if(N > 0 && N <= 1000){
            break;
        }
    }
    int arr[N];
    int input;
    for (int k = 0; k < N; k++)
    {
        cin >> input;
        if(input > 0 && input <=1000){
             arr[k] = input;
        }
        else{
            k -= 1;
        }
    }
    
    int number;
    int mode;
    int position;
    int count = 0;
    int countMode = 1;

    for (int i = 0; i < N; i++)
    {
        number = arr[i];
        for (int j = 0; j < N; j++)
        {
            if(arr[j] == number){
                ++count;
            }
        }
        if(count > countMode){
            countMode = count;
            mode = arr[i];
            position = i;
        }
        else if(count == countMode){
            if(arr[i] > arr[position]){
                mode = arr[i];
                position = i;
            }
        }
        count = 0;
    }
    cout << mode << endl;
    
    return 0;
}

I got a "RTE" (run time error) and 70 pts.

Here is the code which I got 80 pts but got "TLE" (time limit exceeded):

#include <bits/stdc++.h>
using namespace std;

#define ll long long

int main() {

    unsigned int N;
    while(true){
        cin >> N;
        if(N > 0 && N <= 100000){
            break;
        }
    }
    int arr[N];
    int input;
    for (int k = 0; k < N; k++)
    {
        cin >> input;
        if(input > 0 && input <=1000){
             arr[k] = input;
        }
        else{
            k -= 1;
        }
    }
    
    int number;
    vector<int> mode;
    int count = 0;
    int countMode = 1;

    for (int i = 0; i < N; i++)
    {
        number = arr[i];
        for (int j = 0; j < N; j++)
        {
            if(arr[j] == number){
                ++count;
            }
        }
        if(count > countMode){
            countMode = count;
            mode.clear();
            mode.push_back(arr[i]);
        }
        else if(count == countMode){
             mode.push_back(arr[i]);
        }
        count = 0;
    }
    sort(mode.begin(), mode.end(), greater<int>());
    cout << mode.front() << endl;
    
    return 0;
}

How can I accelerate the program?

As already noted, the algorithm implemented in both of the posted snippets has O(N 2 ) time complexity, while there exists an O(N) alternative.

You can also take advantage of some of the algorithms in the Standard Library, like std::max_element , which returns an

iterator to the greatest element in the range [first, last). If several elements in the range are equivalent to the greatest element, returns the iterator to the first such element .

#include <algorithm>
#include <array>
#include <iostream>

int main()
{
    constexpr long max_N{ 100'000L };
    long N;
    if ( !(std::cin >> N) or  N < 1  or  N > max_N  )
    {
        std::cerr << "Error: Unable to read a valid N.\n";
        return 1;
    }

    constexpr long max_value{ 1'000L };
    std::array<long, max_value> counts{};
    for (long k = 0; k < N; ++k)
    {
        long value;
        if ( !(std::cin >> value)  or  value < 1  or  value > max_value )
        {
            std::cerr << "Error: Unable to read value " << k + 1 << ".\n";
            return 1;
        }
        ++counts[value - 1];
    }
    
    auto const it_max_mode{ std::max_element(counts.crbegin(), counts.crend()) };
    // If we start from the last...                 ^^                ^^
    std::cout << std::distance(it_max_mode, counts.crend()) << '\n';
    // The first is also the greatest.
    return 0;
}

Compiler Explorer demo


I got a "RTE" (run time error)

Consider this fragment of the first snippet:

int number;
int mode;
int position;            //   <---     Note that it's uninitialized
int count = 0;
int countMode = 1;

for (int i = 0; i < N; i++)
{
    number = arr[i];
    // [...] Evaluate count.
    if(count > countMode){
        countMode = count;
        mode = arr[i];
        position = i;   //  <---       Here it's assigned a value, but...
    }
    else if(count == countMode){    // If this happens first...
        if(arr[i] > arr[position]){
        //          ^^^^^^^^^^^^^      Position may be indeterminate, here  
            mode = arr[i];
            position = i;
        }
    }
    count = 0;
}

Finally, some resources worth reading:

Why is “using namespace std;” considered bad practice?

Why should I not #include <bits/stdc++.h>?

Using preprocessing directive #define for long long

Why aren't variable-length arrays part of the C++ standard?

You're overcomplicating things. Competitive programming is a weird beast were solutions assume limited resources, whaky amount of input data. Often those tasks are balanced that way that they require use of constant time alternate algorithms, summ on set dynamic programming. Size of code is often taken in consideration. So it's combination of math science and dirty programming tricks. It's a game for experts, "brain porn" if you allow me to call it so: it's wrong, it's enjoyable and you're using your brain. It has little in common with production software developing.

You know that there can be only 1000 different values, but there are huge number or repeated instances. All that you need is to find the largest one. What's the worst case of finding maximum value in array of 1000? O(1000) and you check one at the time. And you already have to have a loop on N to input those values.

Here is an example of dirty competitive code (no input sanitation at all) to solve this problem:

#include <bits/stdc++.h>
using namespace std;

using in = unsigned short;

array<int, 1001> modes;
in               biggest;
int              big_m;
int              N;

int main()
{   
    cin >> N;

    in val;
    while(N --> 0){
       cin >> val;
       if(val < 1001) { 
           modes[val]++; 
       }
       else 
           continue;
       if( modes[val] == big_m) {
           if( val > biggest )
               biggest  = val; 
       }
       else
       if( modes[val] > big_m) { 
           biggest  = val; 
           big_m =  modes[val];
       } 
    }
    
    cout << biggest;
    return 0;
}

No for loops if you don't need them, minimalistic ids, minimalistic data to store. Avoid dynamic creation and minimize automatic creation of objects if possible, those add execution time. Static objects are created during compilation and are materialized when your executable is loaded.

modes is an array of our counters, biggest stores largest value of int for given maximum mode, big_m is current maximum value in modes . As they are global variables, they are initialized statically.

PS. NB. The provided example is an instance of stereotype and I don't guarantee it's 100% fit for that particular judge or closed test cases it uses. Some judges use tainted input and some other things that complicate life of challengers, there is always a factor of unknown. Eg this example would faithfully output "0" if judge would offer that among input values even if value isn't in range.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM