简体   繁体   中英

How do i use MMX mulH and mulL for two 64 bit integers to get one 128 bit integer

Hello, I'm working on yet another arbitrary precision integer library. I wanted to implement multiplication but I got stuck when _m_pmulhw in <mmintrin.h> just didn't work. there is very little documentation on MMX instructions. When I test it out, it just gives me gibberish when I multiply two UINT64_MAXs.

uint_fast64_t mulH(const uint_fast64_t &a, const uint_fast64_t &b)  {  
    return (uint_fast64_t)_m_pmulhw((__m64)a,(__m64)b);
}
uint_fast64_t mulL(const uint_fast64_t &a, const uint_fast64_t &b)  {  
    return (uint_fast64_t)_m_pmullw((__m64)a,(__m64)b);
}
int main() {
    uint64_t a = UINT64_MAX;
    uint64_t b = UINT64_MAX;
    std::cout <<  std::bitset<64>(mulH(a,b)) << std::bitset<64>(mulL(a,b));
}

output: 00000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000100000000000000010000000000000001 

I don't know why it's not working i have an A6-4400M APU...

coreinfo's output: MMX * Supports MMX instruction set

So I think I can say, it isn't unsupported. If anyone can give me some tips on how to make this work thanks.

Compiler: gcc

IDE: visual studio code

I think you misunderstood what _m_pmulhw does. It's actually very clearly documented on Intel's Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_m_pmulhw&expand=4340 . The corresponding instruction is pmulhw , which is also clearly documented on eg Felix Cloutier's x86 instructions guide: https://www.felixcloutier.com/x86/pmulhw

It multiplies four pairs of 16-bit integers which are packed inside the two operands, and then produces the high half of all four multiplies (Packed Multiply High - Word). This means that, for inputs 0x12345678abcdef01, 0x9876543210fedcba, it would multiply 0x1234 * 0x9876 , 0x5678 * 0x5432 , 0xabcd * 0x10fe , 0xef01 * 0xdcba , and pack the high 16 bits of each result into the output.

For your example, you're multiplying 0xffff * 0xffff four times, producing the 32-bit result 0x00000001 ( -1 * -1 , since this is a signed 16-bit multiply), and therefore get 0x0000000000000000 in the high half and 0x0001000100010001 in the low half - which is exactly what you see in the bitset output.


If you're looking for a 128-bit multiply, there isn't actually an intrinsic for that (except _mulx_u64 , but that uses the new mulx instruction which isn't that widespread). Microsoft has the non-standard _mul128 intrinsic, but on other platforms you can just use a __int128 type (or the local equivalent) to get a 64x64=>128 bit multiply.

Also, I'd seriously recommend using the SSE instruction set rather than the older MMX set; the SSE instructions are faster in most cases and enable you to operate on much wider vector types (256-bit is standard now, with AVX512 now available), which can provide a significant speed boost.

I'm not an expert on this, but according to https://www.felixcloutier.com/x86/pmulhw , these instructions don't do a 64x64->128 multiply; they do four 16x16->32 multiplies. Note the word "packed" in the description. Moreover it's a signed multiply.

So your 64-bit UINT64_MAX values are interpreted as four words of 0xffff , which is to say -1 . So you are multiplying -1 by -1 , four times. Of course the numerical answer to each one is 1 . The result of the pmulhw instruction is the high halves of the results (ie four words of 0x0000 ), and pmullw is the low halves (ie four words of 0x0001 ).

Which is exactly what you got, so it seems to me the instructions are working perfectly.

If you want to do an unsigned multiply of two 64-bit integers, the plain old fashioned mul instruction will serve your purpose, and the easiest way to get gcc to generate it is probably by casting the inputs to __uint128_t and multiplying with the usual * operator.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM