简体   繁体   English

在没有超时的情况下在庞大的数据集中找到最大的模式

[英]Find largest mode in huge data set without timing out

Description In statistics, there is a measure of the distribution called the mode.描述在统计学中,有一种分布的度量,称为众数。 The mode is the data that appears the most in a data set.众数是数据集中出现次数最多的数据。 A data set may have more than one mode, that is, when there is more than one data with the same number of occurrences.一个数据集可能有多个模式,即有多个数据具有相同的出现次数。

Mr. Dengklek gives you N integers. Dengklek 先生给你 N 个整数。 Find the greatest mode of the numbers.找出数字的最大众数。

Input Format The first line contains an integer N. The next line contains N integers.输入格式第一行包含一个整数 N。下一行包含 N 个整数。

Output Format A row contains an integer which is the largest mode.输出格式一行包含一个整数,它是最大的模式。

Input Example 6 1 3 2 4 1 4输入示例6 1 3 2 4 1 4

Example Output 4示例输出4

Limits 1 ≤ N ≤100,000限值1 ≤ N ≤100,000

1≤(every integer on the second line)≤1000 1≤(第二行的每个整数)≤1000

#include <iostream>
#include <string>

using namespace std;

#define ll long long

int main() {

    unsigned int N;
    while(true){
        cin >> N;
        if(N > 0 && N <= 1000){
            break;
        }
    }
    int arr[N];
    int input;
    for (int k = 0; k < N; k++)
    {
        cin >> input;
        if(input > 0 && input <=1000){
             arr[k] = input;
        }
        else{
            k -= 1;
        }
    }
    
    int number;
    int mode;
    int position;
    int count = 0;
    int countMode = 1;

    for (int i = 0; i < N; i++)
    {
        number = arr[i];
        for (int j = 0; j < N; j++)
        {
            if(arr[j] == number){
                ++count;
            }
        }
        if(count > countMode){
            countMode = count;
            mode = arr[i];
            position = i;
        }
        else if(count == countMode){
            if(arr[i] > arr[position]){
                mode = arr[i];
                position = i;
            }
        }
        count = 0;
    }
    cout << mode << endl;
    
    return 0;
}

I got a "RTE" (run time error) and 70 pts.我得到了一个“RTE”(运行时错误)和 70 分。

Here is the code which I got 80 pts but got "TLE" (time limit exceeded):这是我得到 80 分但得到“TLE”(超出时间限制)的代码:

#include <bits/stdc++.h>
using namespace std;

#define ll long long

int main() {

    unsigned int N;
    while(true){
        cin >> N;
        if(N > 0 && N <= 100000){
            break;
        }
    }
    int arr[N];
    int input;
    for (int k = 0; k < N; k++)
    {
        cin >> input;
        if(input > 0 && input <=1000){
             arr[k] = input;
        }
        else{
            k -= 1;
        }
    }
    
    int number;
    vector<int> mode;
    int count = 0;
    int countMode = 1;

    for (int i = 0; i < N; i++)
    {
        number = arr[i];
        for (int j = 0; j < N; j++)
        {
            if(arr[j] == number){
                ++count;
            }
        }
        if(count > countMode){
            countMode = count;
            mode.clear();
            mode.push_back(arr[i]);
        }
        else if(count == countMode){
             mode.push_back(arr[i]);
        }
        count = 0;
    }
    sort(mode.begin(), mode.end(), greater<int>());
    cout << mode.front() << endl;
    
    return 0;
}

How can I accelerate the program?如何加速程序?

As already noted, the algorithm implemented in both of the posted snippets has O(N 2 ) time complexity, while there exists an O(N) alternative.如前所述,在两个发布的片段中实现的算法具有 O(N 2 ) 时间复杂度,而存在 O(N) 替代方案。

You can also take advantage of some of the algorithms in the Standard Library, like std::max_element , which returns an您还可以利用标准库中的一些算法,例如std::max_element ,它返回一个

iterator to the greatest element in the range [first, last).到范围 [first, last) 中最大元素的迭代器。 If several elements in the range are equivalent to the greatest element, returns the iterator to the first such element .如果范围内的多个元素等价于最大元素,则返回迭代器到第一个这样的元素

#include <algorithm>
#include <array>
#include <iostream>

int main()
{
    constexpr long max_N{ 100'000L };
    long N;
    if ( !(std::cin >> N) or  N < 1  or  N > max_N  )
    {
        std::cerr << "Error: Unable to read a valid N.\n";
        return 1;
    }

    constexpr long max_value{ 1'000L };
    std::array<long, max_value> counts{};
    for (long k = 0; k < N; ++k)
    {
        long value;
        if ( !(std::cin >> value)  or  value < 1  or  value > max_value )
        {
            std::cerr << "Error: Unable to read value " << k + 1 << ".\n";
            return 1;
        }
        ++counts[value - 1];
    }
    
    auto const it_max_mode{ std::max_element(counts.crbegin(), counts.crend()) };
    // If we start from the last...                 ^^                ^^
    std::cout << std::distance(it_max_mode, counts.crend()) << '\n';
    // The first is also the greatest.
    return 0;
}

Compiler Explorer demo编译器资源管理器演示


I got a "RTE" (run time error)我收到了“RTE”(运行时错误)

Consider this fragment of the first snippet:考虑第一个片段的这个片段:

int number;
int mode;
int position;            //   <---     Note that it's uninitialized
int count = 0;
int countMode = 1;

for (int i = 0; i < N; i++)
{
    number = arr[i];
    // [...] Evaluate count.
    if(count > countMode){
        countMode = count;
        mode = arr[i];
        position = i;   //  <---       Here it's assigned a value, but...
    }
    else if(count == countMode){    // If this happens first...
        if(arr[i] > arr[position]){
        //          ^^^^^^^^^^^^^      Position may be indeterminate, here  
            mode = arr[i];
            position = i;
        }
    }
    count = 0;
}

Finally, some resources worth reading:最后,一些值得一读的资源:

Why is “using namespace std;” 为什么是“使用命名空间标准;” considered bad practice? 被认为是不好的做法?

Why should I not #include <bits/stdc++.h>? 为什么我不应该#include <bits/stdc++.h>?

Using preprocessing directive #define for long long 使用预处理指令 #define for long long

Why aren't variable-length arrays part of the C++ standard? 为什么可变长度数组不是 C++ 标准的一部分?

You're overcomplicating things.你把事情复杂化了。 Competitive programming is a weird beast were solutions assume limited resources, whaky amount of input data.竞争性编程是一种奇怪的野兽,因为解决方案假设资源有限,输入数据量很大。 Often those tasks are balanced that way that they require use of constant time alternate algorithms, summ on set dynamic programming.通常,这些任务是通过这种方式平衡的,它们需要使用恒定时间替代算法,即集合动态规划的总和。 Size of code is often taken in consideration.通常会考虑代码的大小。 So it's combination of math science and dirty programming tricks.所以它结合了数学科学和肮脏的编程技巧。 It's a game for experts, "brain porn" if you allow me to call it so: it's wrong, it's enjoyable and you're using your brain.这是专家的游戏,“脑色情”,如果你允许我这样称呼它:这是错误的,它很有趣,而且你正在使用你的大脑。 It has little in common with production software developing.它与生产软件开发几乎没有共同之处。

You know that there can be only 1000 different values, but there are huge number or repeated instances.您知道只能有 1000 个不同的值,但存在大量或重复的实例。 All that you need is to find the largest one.您所需要的只是找到最大的一个。 What's the worst case of finding maximum value in array of 1000?在 1000 的数组中找到最大值的最坏情况是什么? O(1000) and you check one at the time. O(1000) 并且您当时检查一个。 And you already have to have a loop on N to input those values.并且您已经必须在 N 上进行循环才能输入这些值。

Here is an example of dirty competitive code (no input sanitation at all) to solve this problem:这是解决此问题的脏竞争代码示例(根本没有输入卫生):

#include <bits/stdc++.h>
using namespace std;

using in = unsigned short;

array<int, 1001> modes;
in               biggest;
int              big_m;
int              N;

int main()
{   
    cin >> N;

    in val;
    while(N --> 0){
       cin >> val;
       if(val < 1001) { 
           modes[val]++; 
       }
       else 
           continue;
       if( modes[val] == big_m) {
           if( val > biggest )
               biggest  = val; 
       }
       else
       if( modes[val] > big_m) { 
           biggest  = val; 
           big_m =  modes[val];
       } 
    }
    
    cout << biggest;
    return 0;
}

No for loops if you don't need them, minimalistic ids, minimalistic data to store.如果您不需要它们,则没有 for 循环、简约的 id、要存储的简约数据。 Avoid dynamic creation and minimize automatic creation of objects if possible, those add execution time.如果可能,避免动态创建并尽量减少对象的自动创建,这些会增加执行时间。 Static objects are created during compilation and are materialized when your executable is loaded.静态对象在编译期间创建,并在加载可执行文件时具体化。

modes is an array of our counters, biggest stores largest value of int for given maximum mode, big_m is current maximum value in modes . modes是我们的计数器阵列, biggest int对于给定的最大模式的商店最大值, big_m是当前最大值modes As they are global variables, they are initialized statically.因为它们是全局变量,所以它们是静态初始化的。

PS.附注。 NB.注意。 The provided example is an instance of stereotype and I don't guarantee it's 100% fit for that particular judge or closed test cases it uses.提供的示例是刻板印象的一个实例,我不保证它 100% 适合该特定法官或它使用的封闭测试用例。 Some judges use tainted input and some other things that complicate life of challengers, there is always a factor of unknown.一些评委使用受污染的输入和其他一些使挑战者的生活复杂化的东西,总是有一个未知的因素。 Eg this example would faithfully output "0" if judge would offer that among input values even if value isn't in range.例如,如果判断将在输入值中提供“0”,即使值不在范围内,此示例也将忠实地输出“0”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM