简体   繁体   English

如何在C ++中快速输入数百万个整数?

[英]How to input millions of integers quite fast in C++?

I'm doing a data structure programming assignment about stack in C++. 我正在做一个关于C ++中的堆栈的数据结构编程。

In this assignment, I should read lots of integers(in the worst case I should read 1,600,000 integers) and finally output some strings. 在这个赋值中,我应该读取很多整数(在最坏的情况下我应该读取1,600,000个整数)并最终输出一些字符串。

As a student, I submit my cpp source file and the website judges and scores my source code. 作为一名学生,我提交了我的cpp源文件,网站评判和评分我的源代码。 I got 100% but I want to do better. 我得到100%,但我想做得更好。 The time restriction of this assignment is 2 seconds and the execution time of my source code is 128 milliseconds. 此赋值的时间限制为2秒,源代码的执行时间为128毫秒。 However, the top student only used 52 milliseconds to complete the task. 但是,顶级学生只用了52毫秒来完成任务。 So I want to know how to make my code faster. 所以我想知道如何让我的代码更快。

My source code mainly contains three parts: 我的源代码主要包含三个部分:

  1. use cin to read lots of integers from the OnlineJudge system(up to 1,600,000 integers). 使用cin从OnlineJudge系统中读取大量整数(最多1,600,000个整数)。
  2. try to find the solution and store it in a char array. 尝试找到解决方案并将其存储在char数组中。
  3. use cout to output the char array. 使用cout输出char数组。

OnlineJudge tells me the execution time of my code. OnlineJudge告诉我代码的执行时间。 The 1st part takes 100 milliseconds, the 2nd part takes 20 milliseconds, and the 3rd part takes 12 milliseconds. 第一部分需要100毫秒,第二部分需要20毫秒,第三部分需要12毫秒。 So if I want to make my code faster, I should improve input speed. 因此,如果我想让代码更快,我应该提高输入速度。

Input of OnlineJudge is like this: OnlineJudge的输入是这样的:

5 2
1 2 3 5 4

The 1st line is two integers n and m, the 2nd line is n integers separated by spaces. 第一行是两个整数n和m,第二行是n个整数,用空格分隔。 Restrictions are: 1<=n<=1,600,000 and 0<=m<=1,600,000. 限制条件是:1 <= n <= 1,600,000且0 <= m <= 1,600,000。 In order to read more than 1 million integers, my code is like this: 为了读取超过100万个整数,我的代码是这样的:

#include <iostream>
using namespace std;
int main()
{
    std::ios::sync_with_stdio(false);
    cin.tie(NULL);
    int *exit = new int[1600000];
    cin>>n>>m;
    for (int i=0;i<n;++i)
        cin>>exit[i];
    return 0;
}

If n is small, OnlineJudge says execution time is 0 milliseconds. 如果n很小,则OnlineJudge表示执行时间为0毫秒。 if n is very large,eg 1,600,000. 如果n非常大,例如1,600,000。 OnlineJudge says this code takes 100 milliseconds. OnlineJudge说这段代码需要100毫秒。 If I delete 如果我删除

std::ios::sync_with_stdio(false);
cin.tie(NULL);

Then the code takes 424 milliseconds. 然后代码需要424毫秒。 However, reading integers is necessary in this assignment, so I'm really curious about how the top student can finish "cin,find the solution,cout" within only 52 milliseconds. 但是,在这个任务中读取整数是必要的,所以我真的很好奇顶级学生如何在52毫秒内完成“cin,找到解决方案,cout”。

Do you have any ideas on improving input speed? 您对提高输入速度有什么想法吗?

2019.4.17:Someone suggests using vector or std::from_chars, but in this assignment these are banned. 2019.4.17:有人建议使用vector或std :: from_chars,但在此作业中这些被禁止。 If I write 如果我写

#include <vector>

or 要么

#include <charconv>

or 要么

#include <array>

then OnlineJudge says "Compilation error". 然后OnlineJudge说“编译错误”。

Someone suggests using scanf, my code is like this: 有人建议使用scanf,我的代码是这样的:

for (int i=0;i<n;++i)
    scanf("%d", &exit[i]);

But the execution time is 120 milliseconds.By the way, I don't think scanf is faster than cin, Using scanf() in C++ programs is faster than using cin? 但执行时间是120毫秒。顺便说一句,我不认为scanf比cin快, 在C ++程序中使用scanf()比使用cin更快?

Someone suggests using getline.I seldom uses this fuction,my code is like this: 有人建议使用getline。我很少使用这个功能,我的代码是这样的:

stringstream ss;
string temp;
getline(cin, temp);
ss<<temp;ss>>n;ss>>m;
ss.clear();temp.clear();
getline(cin, temp);ss<<temp;
for (int i=0;i<n;++i)
    ss>>exit[i];

Execution time is also 120 milliseconds. 执行时间也是120毫秒。

Someone suggests using mmap. 有人建议使用mmap。 I've never heard this function before. 我以前从未听过这个功能。 It seems this function is only available in Unix? 看来这个功能只适用于Unix? But I'm using Visual Studio 2010. My code is like this: 但我正在使用Visual Studio 2010.我的代码是这样的:

#include <unistd.h>
#include <sys/mman.h>
    //to load 1,600,000 integers
    int *exit = static_cast<int*>(mmap(NULL,1600*getpagesize(),PROT_READ,MAP_ANON|MAP_SHARED,0,0));
    for (int i=0;i<n;++i)
        cin>>*(exit+i);

OnlineJudge says "Runtime error (signal 11)" instead of "Compilation error", signal 11 means "Invalid memory reference", this signalis is sent to a process when it makes an invalid virtual memory reference, or segmentation fault, ie when it performs a segmentation violation. OnlineJudge说“运行时错误(信号11)”而不是“编译错误”,信号11表示“无效的内存引用”,此信号在进行无效虚拟内存引用或分段错误时发送到进程,即执行时分段违规。 I don't know if there's anything wrong with my mmap.Hope you can tell me. 我不知道我的mmap是否有任何问题。希望你能告诉我。

2019.4.22:Thanks for all your help.Now I solve this problem successfully.The key function is mmap.The code is like this: 2019.4.22:感谢您的帮助。现在我成功解决了这个问题。关键功能是mmap。代码如下:

#include <sys/mman.h>
    cin.tie(NULL);
    std::ios::sync_with_stdio(false);
    string temp;

    int n,m;
    int *exit = new int[1600000];

    const int input_size = 13000000;
    void *mmap_void = mmap(0,input_size,PROT_READ,MAP_PRIVATE,0,0);
    char *mmap_input = (char *)mmap_void;
    int r=0,s=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    while (mmap_input[s]>='0' && mmap_input[s]<='9')
    { r=r*10+(mmap_input[s]-'0');++s; }
    n=r;r=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    while (mmap_input[s]>='0' && mmap_input[s]<='9')
    { r=r*10+(mmap_input[s]-'0');++s; }
    m=r;r=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    for (int i=0;i<n;++i)
    {
        while (mmap_input[s]>='0' && mmap_input[s]<='9')
        { r=r*10+(mmap_input[s]-'0');++s; }
        ++s;
        exit[i]=r;r=0;
    }

Execution time of mmap and convert chars to integers take 8 milliseconds. mmap的执行时间和将字符转换为整数需要8毫秒。 Now the total execution time of this homework take 40 milliseconds, faster than 52 milliseconds. 现在这个作业的总执行时间需要40毫秒,比52毫秒快。

A few ideas: 一些想法:

  1. Read integers using std::scanf , not std::istream . 使用std::scanf读取整数,而不是std::istream The latter is known to be slower for multiple reasons, even with std::ios::sync_with_stdio(false) call. 由于多种原因,后者已知速度较慢,即使使用std::ios::sync_with_stdio(false)调用也是如此。
  2. Read the file by mapping it into memory. 通过将文件映射到内存来读取文件。
  3. Parse integers faster than scanf and strtol . scanfstrtol更快地解析整数。

Example: 例:

#include <cstdio>

int main() {
    int n, m, a[1600000];
    if(2 != std::scanf("%d %d", &n, &m))
        throw;
    for(int i = 0; i < n; ++i)
        if(1 != std::scanf("%d", a + i))
            throw;
}

You can also unroll that scanf loop to read multiple integers in one call. 您还可以展开该scanf循环以在一次调用中读取多个整数。 Eg: 例如:

#include <cstdio>

constexpr int step = 64;
char const fmt[step * 3] =
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d"
    ;
void main() {
    int a[1600000];
    int n, m;
    if(2 != std::scanf("%d %d", &n, &m))
        throw;

    for(int i = 0; i < n; i += step) {
        int expected = step < n - i ? step : n - i;
        int* b = a + i;
        int read = scanf(fmt + 3 * (step - expected),
                         b + 0x00, b + 0x01, b + 0x02, b + 0x03, b + 0x04, b + 0x05, b + 0x06, b + 0x07,
                         b + 0x08, b + 0x09, b + 0x0a, b + 0x0b, b + 0x0c, b + 0x0d, b + 0x0e, b + 0x0f,
                         b + 0x10, b + 0x11, b + 0x12, b + 0x13, b + 0x14, b + 0x15, b + 0x16, b + 0x17,
                         b + 0x18, b + 0x19, b + 0x1a, b + 0x1b, b + 0x1c, b + 0x1d, b + 0x1e, b + 0x1f,
                         b + 0x20, b + 0x21, b + 0x22, b + 0x23, b + 0x24, b + 0x25, b + 0x26, b + 0x27,
                         b + 0x28, b + 0x29, b + 0x2a, b + 0x2b, b + 0x2c, b + 0x2d, b + 0x2e, b + 0x2f,
                         b + 0x30, b + 0x31, b + 0x32, b + 0x33, b + 0x34, b + 0x35, b + 0x36, b + 0x37,
                         b + 0x38, b + 0x39, b + 0x3a, b + 0x3b, b + 0x3c, b + 0x3d, b + 0x3e, b + 0x3f);
        if(read != expected)
            throw;
    }
}

Another option is to parse integers manually (mapping file into memory would help here and there are much faster algorithms for parsing integers than this and standard atoi/strtol , see Fastware - Andrei Alexandrescu ): 另一个选择是手动解析整数(将文件映射到内存中会有所帮助,并且解析整数的速度比这个和标准atoi/strtol 快得多 ,请参阅Fastware - Andrei Alexandrescu ):

int main() {
    int n, m, a[1600000];
    if(2 != std::scanf("%d %d", &n, &m))
        throw;

    for(int i = 0; i < n; ++i) {
        int r = std::getchar();
        while(std::isspace(r))
            r = std::getchar();
        bool neg = false;
        if('-' == r) {
            neg = true;
            r = std::getchar();
        }
        r -= '0';
        for(;;) {
            int s = std::getchar();
            if(!std::isdigit(s))
                break;
            r = r * 10 + (s - '0');
        }
        a[i] = neg ? -r : r;
    }
}

Yet another is to map the file into memory and parse it faster: 另一种方法是将文件映射到内存并更快地解析它:

#include <boost/iostreams/device/mapped_file.hpp>

inline int find_and_parse_int(char const*& begin, char const* end) {
    while(begin != end && std::isspace(*begin))
        ++begin;
    if(begin == end)
        throw;
    bool neg = *begin == '-';
    begin += neg;
    int r = 0;
    do {
        unsigned c = *begin - '0';
        if(c >= 10)
            break;
        r = r * 10 + static_cast<int>(c);
    } while(++begin != end);
    return neg ? -r : r;
}

void main() {
    boost::iostreams::mapped_file f("random-1600000.txt", boost::iostreams::mapped_file::readonly);
    char const* begin = f.const_data();
    char const* end = begin + f.size();
    int n = find_and_parse_int(begin, end);
    int m = find_and_parse_int(begin, end);

    int a[1600000];
    for(int i = 0; i < n; ++i)
        a[i] = find_and_parse_int(begin, end);
}

Benchmark source code . 基准源代码

Note that the results may differ considerably across different versions of compilers and standard libraries: 请注意,不同版本的编译器和标准库的结果可能会有很大差异:

  • CentOS release 6.10, g++-6.3.0, Intel Core i7-4790 CPU @ 3.60GHz CentOS版本6.10,g ++ - 6.3.0,Intel Core i7-4790 CPU @ 3.60GHz
---- Best times ----
seconds,    percent, method
0.167985515,  100.0, getchar
0.147258495,   87.7, scanf
0.137161991,   81.7, iostream
0.118859546,   70.8, scanf-multi
0.034033769,   20.3, mmap-parse-faster
  • Ubuntu 18.04.2 LTS, g++-8.2.0, Intel Core i7-7700K CPU @ 4.20GHz Ubuntu 18.04.2 LTS,g ++ - 8.2.0,Intel Core i7-7700K CPU @ 4.20GHz
---- Best times ----
seconds,    percent, method
0.133155952,  100.0, iostream
0.102128208,   76.7, scanf
0.082469185,   61.9, scanf-multi
0.048661004,   36.5, getchar
0.025320109,   19.0, mmap-parse-faster

time of my source code is 128 milliseconds. 我的源代码的时间是128毫秒。 However, the top student only used 52 milliseconds 但是,顶尖学生只用了52毫秒

To run an entire program, this is getting into the area of margin of error. 要运行整个程序,这将进入误差范围。 Setting up processes on modern OS takes some time, as will whatever is feeding the input data, and if the server is a shared resource, any resource contention issues. 在现代操作系统上设置进程需要一些时间,输入数据的输入也是如此,如果服务器是共享资源,则任何资源争用都会发生。 How much does submitting the same exact code vary? 提交相同的确切代码有多少变化?

int *exit = new int[1600000]; int * exit = new int [1600000];

Memory allocations have a cost. 内存分配有成本。 In high performance loops and the like they are often avoided entirely, allthough a single allocation is unlikely to make a major overall difference. 在高性能循环等中,它们通常被完全避免,尽管单个分配不太可能产生重大的整体差异。

Input of OnlineJudge is like this: OnlineJudge的输入是这样的:

 5 2 1 2 3 5 4 

The 1st line is two integers n and m, the 2nd line is n integers separated by spaces. 第一行是两个整数n和m,第二行是n个整数,用空格分隔。 Restrictions are: 1<=n<=1,600,000 and 0<=m<=1,600,000. 限制条件是:1 <= n <= 1,600,000且0 <= m <= 1,600,000。 In order to read more than 1 million integers, my code is like this: 为了读取超过100万个整数,我的代码是这样的:

I found that std::cin , etc. can be slow, and in some cases so can the number parsing functions. 我发现std::cin等可能很慢,在某些情况下,数字解析功能也是如此。 If you can read say the entire line in one go, and then parse that, it may be faster. 如果你可以一口气说出整行,然后解析它,它可能会更快。 For parsing the gains generally come from parsing in unsafe ways if you can garuntee the inputs, eg 对于解析增益通常来自以不安全的方式解析如果你可以提供输入,例如

  • Is ' ' always the delimiter? 是'始终是分隔符? Looks like it is, and you can special case the end. 看起来像是,你可以特殊情况结​​束。 Eg read the entire "line" into a buffer then replace the '\\n' with ' '. 例如,将整个“行”读入缓冲区,然后将'\\ n'替换为''。
  • Is the number of digits known? 已知位数是多少? Is it always 1, or some other small number like less than 5? 它总是1,还是其他一些小于5的小数?
  • Are the numbers always in the valid range? 数字是否始终在有效范围内?
  • Is the input always a valid number, no random chars to check for? 输入始终是有效数字,没有随机字符要检查?
  • Are there ever negative numbers? 有没有负数?

Knowing these things you might make say: 了解这些事情你可能会说:

/*1 or 2 digit int, space delimiter. Advance p number of consumed chars.*/
int parse_small_int(char **p)
{
    int v = (*p)p[0] - '0';
    char c2 = (*p)[1];
    if (c2 == ' ') // 1 digit
    {
        return v;
    }
    else // assume 2 digit
    {
        v *= 10;
        v += (c2 - '0')
        (*p) += 2;
    }
}

Do you have any ideas on improving input speed? 您对提高输入速度有什么想法吗?

Same goes for output, you don't seem to show the code, but std::cout can be similarly slow. 输出也一样,你似乎没有显示代码,但std :: cout可能同样很慢。 And if you know some things about the numbers and the allowed output format, you can easily beat << , std::to_string , itoa , etc. 如果您对数字和允许的输出格式有所了解,您可以轻松击败<<std::to_stringitoa等。

  • Are leading zeros valid? 前导零有效吗? If they are, you could write a condition-less formatter for the max allowed value. 如果是,您可以为最大允许值编写无条件格式化程序。
  • Do such formatting to a pre-allocated buffer, then print the entire line. 对预先分配的缓冲区执行此类格式化,然后打印整行。

eg 例如

// always write 2 chars to p
void format_int_2_digit(int i, char *p)
{
    p[0] = '0' + (i / 10);
    p[1] = '0' + (i % 10);
}

Another possibility is to bypass the C++ and even C library, although that may not be allowed in your assignment. 另一种可能性是绕过C ++甚至C库,尽管在您的任务中可能不允许这样做。

For example on Linux you could use the read and write functions with STDIN_FILENO and STDOUT_FILENO . 例如在Linux上,你可以使用readwrite与功能STDIN_FILENOSTDOUT_FILENO I never actually compared these personally to the CRT versions, but maybe there is a noticeable difference. 我从未真正将这些个人与CRT版本进行比较,但可能存在明显的差异。 On Windows there is ReadConsole , WriteConsole , etc., or use GetStdHandle and then ReadFile , WriteFile ,etc. 在Windows上有ReadConsoleWriteConsole等, 或者使用GetStdHandle ,然后使用ReadFileWriteFile等。 Again I never measured these. 我再也没有测量过这些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM