简体   繁体   English

优化整数和浮点乘法

[英]Optimize integer and floating point multiplication

I am trying to optimize following operation where I have a large number of unsigned short inputs which needs to be scaled down by a certain factor. 我正在尝试优化以下操作,其中我有大量无符号短输入,需要按某个因子按比例缩小。 Is there a way to optimize it to not use floating point operations 有没有办法优化它不使用浮点运算

unsigned short val = 65523U;
val = val * 0.943;

Note 注意

I will be running above operation on a DSP where floating point operations are costly 我将在DSP上进行上述操作,其中浮点运算成本很高

The simplest way is to just use a 32 bit type that can hold the result: 最简单的方法是使用可以保存结果的32位类型:

uint16_t val = 65523U;
val = (uint_fast32_t)val * 943 / 1000;

Or if you want more type correctness and portability, while at the same time allowing the compiler to use the best possible integer type for the task: 或者,如果您想要更多类型的正确性和可移植性,同时允许编译器为任务使用最佳的整数类型:

#include <stdint.h>

uint_fast16_t val = UINT16_C(65523);
val = (uint_fast16_t) ( (uint_fast32_t)val * (uint_fast32_t)943 / (uint_fast32_t)1000 );

You can multiply with an integer approximation of 0.943 * 2^16, then divide by 2^16 which your compiler should transform into a right shift. 您可以乘以整数近似值0.943 * 2 ^ 16,然后除以2 ^ 16,编译器应将其转换为右移。 Assuming 16-bit shorts and at least 32-bit ints: 假设16位短路和至少32位整数:

val = ((unsigned)val * 61800) / 65536;

Depending on your exact requirements, you might get more accurate results by rounding to the nearest integer: 根据您的具体要求,您可以通过四舍五入到最接近的整数来获得更准确的结果:

val = ((unsigned)val * 61800 + 32768) / 65536;

Any other power of two will work. 任何其他两种力量都可以。 On a 64-bit platform, you could use 2^48 for more precision. 在64位平台上,您可以使用2 ^ 48来获得更高的精度。

The mult / divide thing is good. 多重/分裂的事情是好的。 But even better is that you can avoid the divide. 但更好的是你可以避免分歧。

unisgned short has a range 0 ... 65535. unisisned short的范围是0 ... 65535。

All maths calculations in the CPU are internally handled as 32 bit numbers. CPU中的所有数学计算都在内部处理为32位数。 But they get cast back to 16 bit after the computation. 但是在计算之后它们会被抛回到16位。 You want to avoid that if you're multiplying a short by a large number. 如果你将一个短数乘以一个大数,你想避免这种情况。 The output will be a short, causing it to truncate the value. 输出将很短,导致它截断该值。 So I put casts in to show what's going on and to make sure there's no extra type casting going on from the compiler. 所以我放置了强制转换来显示正在发生的事情,并确保编译器没有额外的类型转换。

unsigned short val = 65523U;

const unsigned int mult = 65536 * 0.943; // expressed as a fraction of 2^16

unsigned short output = (unsigned short)(((unsigned int)val * mult) >> 16));

So this casts the value to 32 bit unsigned int (to guarantee control of the types), multiplies it by up to 2^16 based on the original fraction, then right-shifts it by 16 to put it back in the correct scale. 因此,这会将值转换为32位无符号整数(以保证对类型的控制),根据原始分数将其乘以最多2 ^ 16,然后将其右移16,使其恢复到正确的比例。

you could multiply by 943 then divide by 1000 . 你可以乘以943然后除以1000 You'd save a floating point division (but you'd do a multiplication + an euclidian division). 你会保存一个浮点除法(但你要做乘法+一个欧几里德除法)。

unsigned short val = 65523U;
val = (val*943UL)/1000;

I get: 61788 我得到: 61788

it works (even on systems where int is 16-bit wide) as long as var*943 is within unsigned long capacity ( unsigned long long could be used to extend the limit even further). 只要var*943unsigned long容量内( unsigned long long可用于进一步扩展限制),它就可以工作(即使在int为16位宽的系统上)。

you could multiply by 943 then divide by 1000 . 你可以乘以943然后除以1000 You'd save a floating point division (but you'd do a multiplication + an euclidian division). 你会保存一个浮点除法(但你要做乘法+一个欧几里德除法)。

unsigned short val = 65523U;
val = (val*943UL)/1000;

I get: 61788 我得到: 61788

it works (even on systems where int is 16-bit wide) as long as var*943 is within unsigned long capacity ( unsigned long long could be used to extend the limit even further). 只要var*943unsigned long容量内( unsigned long long可用于进一步扩展限制),它就可以工作(即使在int为16位宽的系统上)。

EDIT: You could even avoid division computing the ratio times a power of 2, I chose 16: 编辑:你甚至可以避免除法计算比率乘以2的幂,我选择了16:

So .943*(1<<16) which is 61800.448 所以.943*(1<<16)61800.448

and you could do one multiplication and one shift operation (very fast). 你可以做一次乘法和一次换班操作(非常快)。 It's better to use unsigned long long at this point because the intermediate result can get very large: 此时使用unsigned long long更好,因为中间结果会变得非常大:

val = (val*61800UL)>>16;

to get roughly the same result: 61787 . 得到大致相同的结果: 61787 Use 61801 and you get 61788 使用61801 ,你得到61788

With a platform that uses a 32 bit int or higher, using 使用32位int或更高的平台,使用

int val = 65523U;
val = val * 943 / 1000;

would be hard to beat. 很难被击败。 Convert the truncation to German rounding by altering the coefficients. 通过更改系数将截断转换为德语舍入。 If your system has a 16 bit int then you could use a long (note then that the multiplication by 943 and division by 1000 would take place in long arithmetic) but the solution would require profiling. 如果你的系统有16位int那么你可以使用long (注意乘以943并除以1000将在long算术中发生)但是解决方案需要分析。

Dividing by 1000 first would cause truncation issues; 先划分1000会导致截断问题; a larger type is required to accommodate the larger value. 需要更大的类型来容纳更大的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM