更新控制台输出，同时尽可能降低程序速度

Question

I have a single threaded program that does some operations on a large file (~16GB) in a loop, and has a variable count that increments on each loop, I want to be able to see what the count is at by outputting to console every so often, but I dont want it to significantly slow down my program, so I wanted to know if it would be faster to use modulo on count, and output if it equals 0 ( if (count%1_000_000==0) {std::cout << count << endl;} ), or if I should use system time, and if more than .25 of a second has passed, print to console, or just print to console every time it loops through.我有一个单线程程序，它在一个循环中对一个大文件（~16GB）执行一些操作，并且有一个在每个循环中递增的变量计数，我希望能够通过输出到控制台来查看计数是多少经常如此，但我不希望它显着减慢我的程序，所以我想知道在计数上使用模数是否会更快，如果它等于 0 则输出（ if (count%1_000_000==0) {std::cout << count << endl;} ），或者如果我应该使用系统时间，并且如果超过 0.25 秒，则打印到控制台，或者每次循环时只打印到控制台。

the loop will run a few billion times aprox.循环将运行数十亿次。

Answer 1

count % 1'000'000 is a slow operation, even though the compiler optimizes that to multiplication by an inverse. count % 1'000'000是一个缓慢的操作，即使编译器将其优化为乘以逆。 If you use a power of 2 on the other hand this operation becomes much simpler.另一方面，如果您使用 2 的幂，则此操作会变得更加简单。 For example here is x % n == 0 for 1'000'000 and 1 << 20 == 1'048'576 with int .例如这里是x % n == 0代表1'000'000和1 << 20 == 1'048'576和int 。

mod_1_000_000(int):
        imul    edi, edi, 1757569337
        add     edi, 137408
        ror     edi, 6
        cmp     edi, 4294
        seta    al
        ret

mod_1_048_576(int):
        and     edi, 1048575
        setne   al
        ret

If count is uint64_t the difference gets much more pronounced.如果count为uint64_t ，则差异会更加明显。

An if (count % 1'048'576 == 0) will be cheap to compute and the branch predictor will only get about 1 miss in a million. if (count % 1'048'576 == 0)的计算成本很低，并且分支预测器将只有百万分之一的未命中。 So this would be cheap.所以这会很便宜。 You can probably make it even better by marking it unlikely so the code for printing console output gets put into a cold path.您可以通过将其标记为不太可能来使它变得更好，以便打印控制台输出的代码被放入冷路径。

Getting the system time and printing every .25 seconds sounds great.获取系统时间并每 0.25 秒打印一次听起来很棒。 But if you are getting the system time inside the loop that will be millions of function calls.但是，如果您在循环中获取系统时间，那将是数百万个函数调用。 Those will be expensive, far more than count % (1 << 20) .那些会很昂贵，远远超过count % (1 << 20) 。

Unfortunately you can't use alarm to interrupt the code periodically because you can't print to the console in a signal handler.不幸的是，您不能使用alarm定期中断代码，因为您无法在信号处理程序中打印到控制台。 But you could use multithreading, having one thread do the work and the other print updates and sleep in a loop.但是您可以使用多线程，让一个线程完成工作，另一个打印更新并循环休眠。

Problem there is how to get the count from one thread to the other.问题是如何从一个线程获取count到另一个线程。 The compiler has probably optimized that into a register so the other thread reading the memory location where count is stored won't show the actual count.编译器可能已将其优化到寄存器中，因此读取存储count的内存位置的其他线程不会显示实际计数。 You would have to make the variable atomic and that would increase the cost of using it.您必须使变量atomic化，这会增加使用它的成本。

Bets bet would be using投注投注将使用

if (count % (1 << 20) == 0) atomic_count = count;

and update a shared atomic variable every so often.并每隔一段时间更新一个共享原子变量。 But is all that overhead of multithreading worth it?但是，多线程的所有开销都值得吗？ You aren't avoiding the if in the inner loop, just reducing the amount of code executed once in a blue moon.您并没有避免内部循环中的if ，只是减少了一次执行的代码量。

更新控制台输出，同时尽可能降低程序速度

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-13 15:26:21

更新控制台输出，同时尽可能降低程序速度

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-13 15:26:21

解决方案1
1 已采纳 2022-07-13 15:26:21