简体   繁体   English

通过16位移位进行32位乘法运算

[英]32-bit multiplication through 16-bit shifting

I am writing a soft-multiplication function call using shifting and addition. 我正在使用移位和加法编写一个软乘法函数调用。 The existing function call goes like this: 现有的函数调用如下:

unsigned long __mulsi3 (unsigned long a, unsigned long b) {

    unsigned long answer = 0;

    while(b)
    {
        if(b & 1) {
            answer += a;
        };

        a <<= 1;
        b >>= 1;
    }
    return answer;
}

Although my hardware does not have a multiplier, I have a hard shifter. 虽然我的硬件没有倍增器,但我有一个硬移位器。 The shifter is able to shift up to 16 bits at one time. 移位器一次最多可以移位16位。

If I want to make full use of my 16-bit shifter. 如果我想充分利用我的16位移位器。 Any suggestions on how can I adapt the code above to reflect my hardware's capabilities? 有关如何调整上述代码以反映我的硬件功能的任何建议? The given code shifts only 1-bit per iteration. 给定的代码每次迭代仅移位1位。

The 16-bit shifter can shift 32-bit unsigned long values up to 16 places at a time. 16位移位器可以一次将32位无符号长值移位16个位置。 The sizeof(unsigned long) == 32 bits sizeof(无符号长整数)== 32位

The ability to shift multiple bits is not going to help much, unless you have a hardware multiply, say 8-bit x 8-bit, or you can afford some RAM/ROM to do (say) a 4-bit by 4-bit multiply by lookup. 移位多位的能力不会有多大帮助,除非你有硬件乘法,比如说8位x 8位,或者你可以负担得起一些RAM / ROM(例如)4位乘4位乘以查找。

The straightforward shift and add (as you are doing) can be helped by swapping the arguments so that the multiplier is the smaller. 通过交换参数以使乘数更小,可以帮助直接转换和添加(正如您所做)。

If your machine is faster doing 16 bit things in general, then treating your 32-bit 'a' as 'a1:a0' 16-bits at a time, and similarly 'b', you just might be able to same some cycles. 如果您的机器通常更快地执行16位操作,那么一次将32位“a”视为“a1:a0”16位,类似地“b”,您可能可以将某些周期相同。 Your result is only 32-bits, so you don't need to do 'a1 * b1' -- though one or both of those may be zero, so the win may not be big! 你的结果只有32位,所以你不需要做'a1 * b1' - 虽然其中一个或两个可能都是零,所以胜利可能不大! Also, you only need the ls 16-bits of 'a0 * b1', so that can be done entirely 16-bits -- but if b1 (assuming b <= a) is generally zero this is not a big win, either. 此外,您只需要16位'a0 * b1',因此可以完全16位 - 但如果b1(假设b <= a)通常为零,那么这也不是一个大赢家。 For 'a * b0', you need a 32-bit 'a' and 32-bit adds into 'answer', but your multiplier is 16-bits only... which may or may not help. 对于'a * b0',你需要一个32位'a'和32位加'answer',但你的乘数只有16位...这可能有助于也可能没有帮助。

Skipping runs of multiplier zeros could help -- depending on processor and any properties of the multiplier. 跳过乘数零的运行可能会有所帮助 - 取决于处理器和乘数的任何属性。

FWIW: doing the magic 'a1*b1', '(a1-a0)*(b0-b1)', 'a0*b0' and combining the result by shifts, adds and subtracts is, in my small experience, an absolute nightmare... the signs of '(a1-a0)', '(b0-b1)' and their product have to be respected, which makes a bit of a mess of what looks like a cute trick. FWIW:做一个神奇的'a1 * b1','(a1-a0)*(b0-b1)','a0 * b0',并根据我的小经验,通过轮班,加法和减法合成结果是绝对的噩梦......'(a1-a0)','(b0-b1)'的标志及其产品必须得到尊重,这使得看起来像一个可爱的伎俩有点混乱。 By the time you have finished with that and the adds and subtracts, you have to have a mighty slow multiply to make it all worth while ! 当你完成它以及添加和减去时,你必须有一个强大的缓慢乘法,以使它全部值得! When multiplying very, very long integers this may help... but there the memory issues may dominate... when I tried it, it was something of a disappointment. 当乘以非常非常长的整数时,这可能会有所帮助......但是内存问题可能占主导地位...当我尝试它时,这是一种令人失望的事情。

Having 16-bit shifts can help you in making minor speed enhancement using the following approach: 使用16位移位可以帮助您使用以下方法进行小幅度的增强:

(U1 * P + U0) * (V1 * P + V0) =
= U1 * V1 * P * P + U1 * V0 * P + U0 * V1 * P + U0 * V0 =
= U1 * V1 * (P*P+P) + (U1-U0) * (V0-V1) * P + U0 * V0 * (1-P)

provided P is a convenient power of 2 (for example, 2^16, 2^32), so multiplying to it is a fast shift. 假设P是2的方便幂(例如,2 ^ 16,2 ^ 32),因此乘以它是快速移位。 This reduces from 4 to 3 multiplications of smaller numbers, and, recursively, O(N^1.58) instead of O(N^2) for very long numbers. 这减少了从4到3的较小数字的乘法,并且递归地,对于非常长的数字,O(N ^ 1.58)而不是O(N ^ 2)。

This method is named Karatsubaʼs multiplication . 这种方法被命名为Karatsuba的乘法 There are more advanced versions described there. 这里描述了更多高级版本。

For small numbers (eg 8 by 8 bits), the following method is fast, if you have enough fast ROM: 对于较小的数字(例如8乘8位),如果你有足够的快速ROM,下面的方法很快:

a * b = square(a+b)/4 - square(a-b)/4

if to tabulate int(square(x)/4) , you'll need 1022 bytes for unsigned multiplication and 510 bytes for signed one. 如果要将int(square(x)/4)制成表格,则无符号乘法需要1022个字节,有符号乘法需要510个字节。

The basic approach is (assuming shifting by 1) :- 基本方法是(假设换1): -

  • Shift the top 16 bits 将前16位移位
  • Set the bottom bit of the top 16 bits to the top bit of the bottom 16 bits 将前16位的最低位设置为最后16位的最高位
  • Shift the bottom 16 bits 将底部16位移位

Depends a bit on your hardware... 取决于你的硬件......

but you could try :- 但你可以试试: -

  • assuming unsigned long is 32 bits 假设unsigned long是32位
  • assuming Big Endian 假设Big Endian

then :- 然后 :-

 union Data32
        {
           unsigned long l;
           unsigned short s[2];
        }; 

unsigned long shiftleft32(unsigned long valueToShift, unsigned short bitsToShift)
{
    union Data32 u;
    u.l  = valueToShift
    u.s[0] <<= bitsToShift;
    u.s[0] |= (u.s[1] >> (16 - bitsToShift);
    u.s[1] <<= bitsToShift

    return u.l;
}

then do the same in reverse for shifting right 然后反过来换右转

the code above is multiplying on the traditional way, the way we learnt in primary school : 上面的代码正在以传统方式,我们在小学学习的方式成倍增加:

EX: EX:

    0101
  * 0111
  -------
    0101
   0101.
  0101..
 --------
  100011

of course you can not approach it like that if you don't have either a multiplier operator or 1-bit shifter! 当然,如果你没有乘法运算符或1位移位器,你就无法接近它! though, you can do it in other ways, for example a loop : 但是,您可以通过其他方式执行此操作,例如循环:

unsigned long _mult(unsigned long a, unsigned long b)
{
    unsigned long res =0;

    while (a > 0)
    {
        res += b;
        a--;
    }

    return res;
} 

It is costy but it serves your needings, anyways you can think about other approaches if you have more constraints (like computation time ...) 它很实用,但它满足您的需求,无论如何,如果你有更多的约束(比如计算时间......),你可以考虑其他方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM