简体   繁体   English

C / C ++库函数和运算符是最优化的吗?

[英]Are C/C++ library functions and operators the most optimal ones?

So, at the divide & conquer course we were taught: 因此,在分而治之课程中,我们被教导:

  1. Karatsuba multiplication 唐津乘法
  2. Fast exponentiation 快速求幂

Now, given 2 positive integers a and b is operator::* faster than a karatsuba(a,b) or is pow(a,b) faster than 现在,给定2个正整数a和b是operator::*karatsuba(a,b)karatsuba(a,b)或者pow(a,b)

int fast_expo(int Base, int exp)
{
    if (exp == 0) {
        return 1;
    }
    if (exp == 1) {
        return Base
    }
    if (exp % 2 == 0) {
        return fast_expo(Base, exp / 2) * fast_expo(Base, exp / 2);
    }
    else {
        return base * fast_expo(Base, exp / 2) * fast_expo(Base, exp / 2);
    }
}

I ask this because I wonder if they have just a teaching purpose or they are already base implemented in the C/C++ language 我问这个问题是因为我想知道它们是否只是出于教学目的,或者它们是否已经以C / C ++语言实现

Karatsuba multiplication is a special technique for large integers. 唐津乘法是用于大整数的一种特殊技术。 It is not comparable to the built in C++ * operator which multiplies together operands of basic type like int and double . 它不能与将基本类型(例如intdouble操作数相乘的内置C ++ *运算符相提并论。

To take advantage of Karatsuba, you have to be using multi-precision integers made up of at least around 8 words. 要使用唐津(Karatsuba),您必须使用至少由8个单词组成的多精度整数。 (512 bits, if these are 64 bit words). (如果是64位字,则为512位)。 The break-even point at which Karatsuba becomes advantageous is at somewhere between 8 and 24 machine words, according to the accepted answer to this question . 根据对该问题的公认答案,唐津羽场变得有利的收支平衡点在8到24个机器单词之间。

The pow function which works with a pair of floating-point operands of type double , is not comparable to your fast_expo , which works with operands of type int . 与一对double类型的浮点操作数一起使用的pow函数不能与您的fast_expo ,后者与int类型的操作数一起使用。 They are different functions with different requirements. 它们是具有不同要求的不同功能。 With pow , you can calculate the cube root of 5: pow(5, 1/3.0) . 使用pow ,您可以计算5的立方根: pow(5, 1/3.0) If that's what you would like to calculate, then fast_expo is of no use, no matter how fast. 如果那是您想要计算的,那么无论多快, fast_expo都没有用。

There is no guarantee that your compiler or C library's pow is absolutely the fastest way for your machine to exponentiate two double-precision floating-point numbers. 不能保证编译器或C库的pow绝对是您的计算机对两个双精度浮点数求幂的最快方法。

Optimization claims in floating-point can be tricky, because it often happens that multiple implementations of the "same" function do not give exactly the same results down to the last bit. 浮点数的优化声明可能很棘手,因为经常会发生“相同”函数的多个实现在最后一位没有给出完全相同的结果的情况。 You can probably write a fast my_pow that is only good to five decimal digits of precision, and in your application, that approximation might be more than adequate. 您可能可以编写一个快速的my_pow ,该精度仅好于五个精度的十进制数字,并且在您的应用程序中,近似值可能my_pow Have you beat the library? 你打败图书馆了吗? Hardly; 几乎不; your fast function doesn't meet the requirements that would qualify it as a replacement for the pow in the library. 您的快速功能不符合将其替换为库中pow资格的要求。

operator::* and other standard operators usually map to the primitives provided by the hardware. operator::*和其他标准运算符通常映射到硬件提供的原语。 In case, such primitives don't exist (eg 64-bit long long on IA32), the compiler emulates them at a performance penalty (gcc does that in libgcc ). 万一此类原语不存在(例如,IA32上的64位long long ),编译器会对其进行仿真,但会降低性能(gcc在libgcclibgcc )。

Same for std::pow . std::pow相同。 It is part of the standard library and isn't mandated to be implemented in a certain way. 它是标准库的一部分,没有强制以某种方式实现。 GNU libc implements pow(a,b) as exp(log(a) * b) . GNU libc将pow(a,b)exp(log(a) * b) exp and log are quite long and written for optimal performance with IEEE754 floating point in mind. explog相当长,在编写时要考虑IEEE754浮点以实现最佳性能。


As for your sugestions: 至于你的建议:

Karatsuba multiplication for smaller numbers isn't worth it. 小号唐津乘法不值得。 The multiply machine instruction provided by the processor is already optimized for speed and power usage for the standard data types in use. 处理器提供的乘法器指令已经针对所使用的标准数据类型的速度和功耗进行了优化。 With bigger numbers, 10-20 times the register capacity, it starts to pay off : 有了更大的数字,即寄存器容量的10到20倍, 它就会开始获得回报

In the GNU MP Bignum Library , there used to be a default KARATSUBA_THRESHOLD as high as 32 for non-modular multiplication (that is, Karatsuba was used when n>=32w with typically w=32 ); GNU MP Bignum库中 ,过去的默认KARATSUBA_THRESHOLD用于非模数乘法(即,当n>=32w且通常w=32时使用Karatsuba); the optimal threshold for modular exponentiation tending to be significantly higher. 模幂的最佳阈值往往要高得多。 On modern CPUs, Karatsuba in software tends to be non-beneficial for things like ECDSA over P-256 ( n=256 , w=32 or w=64 ), but conceivably useful for much wider modulus as used in RSA. 在现代CPU上,软件中的Karatsuba对于诸如PDS 256上的ECDSA之类的东西( n=256w=32w=64 )来说往往是无益的,但是可以想象得到,对于RSA中使用的更宽的模数有用。

Here is a list with the multiplication algorithms , GNU MP uses and their respective thresholds. 这是一个包含乘法算法 ,GNU MP使用及其各自阈值的列表。

Fast exponentiation doesn't apply to non-integer powers, so it's not really comparable to pow . 快速取幂不适用于非整数幂,因此它实际上不能与pow相提并论。

A good way to check the speed of an operation is to measure it. 检查操作速度的一种好方法是对其进行测量。 If you run through the calculation a billion or so times and see how much time it takes to execute you have your answer there. 如果您进行了十亿次左右的计算,然后查看执行了多少时间,那么答案就在那里。

One thing to note. 要注意的一件事。 I'm lead to believe that % is fairly expensive. 我导致相信%相当昂贵。 There is a much faster way to check if something is divisible by 2: 有一种更快的方法来检查某物是否可被2整除:

check_div_two(int number)
{
    return ((number>>1) & 0x01);
}

This way you've just done a bit shift and compared against a mask. 这样,您只是做了一点移动,并与蒙版进行了比较。 I'd assume it's a less expensive op. 我认为这是一个便宜些的操作。

The * operator for built-in types will almost certainly be implemented as a single CPU multiplication instruction. 内置类型的*运算符几乎肯定会实现为单个CPU乘法指令。 So ultimately this is a hardware question, not a language question. 因此,最终这是一个硬件问题,而不是语言问题。 Longer code sequences, perhaps function calls, might be generated in cases where there's no direct hardware support. 在没有直接硬件支持的情况下, 可能会生成更长的代码序列(可能是函数调用)。

It's safe to assume that chip manufacturers (Intel, AMD, et al) expend a great deal of effort making arithmetic operations as efficient as possible. 可以肯定地说,芯片制造商(英特尔,AMD等)花费了大量精力使算术运算尽可能高效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM