GCC无法对64位乘法进行矢量化。可以在AVX2上对64位x 64位 - > 128位加宽乘法进行矢量化吗？

Question

I try to vectorize a CBRNG which uses 64bit widening multiplication. 我尝试对使用64位加宽乘法的CBRNG进行矢量化。

static __inline__ uint64_t mulhilo64(uint64_t a, uint64_t b, uint64_t* hip) {
    __uint128_t product = ((__uint128_t)a)*((__uint128_t)b);
    *hip = product>>64;
    return (uint64_t)product;
}

Is such a multiplication exists in a vectorized form in AVX2? 这种乘法在AVX2中是否以矢量化形式存在？

Answer 1

No. There's no 64 x 64 -> 128 bit arithmetic as a vector instruction. 没有。作为矢量指令，没有64 x 64 - > 128位算术。 Nor is there a vector mulhi type instruction (high word result of multiply). 也没有矢量mulhi类型指令（乘法的高字结果）。

[V]PMULUDQ can do 32 x 32 -> 64 bit by only considering every second 32 bit unsigned element, or unsigned doubleword, as a source, and expanding each 64 bit result into two result elements combined as an unsigned quadword. [V] PMULUDQ只能将每秒32位无符号元素或无符号双字作为源，并将每个64位结果扩展为两个结果元素组合为无符号四字，从而可以执行32 x 32 - > 64位。

The best you can probably hope for right now is Haswell's MULX instruction, which has more flexible register use, and does not affect the flags register - eliminating some stalls. 您现在可能希望的最好的是Haswell的MULX指令，它具有更灵活的寄存器使用，并且不会影响标志寄存器 - 消除了一些停顿。

GCC无法对64位乘法进行矢量化。可以在AVX2上对64位x 64位 - > 128位加宽乘法进行矢量化吗？

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-07-04 13:37:13

GCC无法对64位乘法进行矢量化。 可以在AVX2上对64位x 64位 - &gt; 128位加宽乘法进行矢量化吗？

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-07-04 13:37:13

GCC无法对64位乘法进行矢量化。可以在AVX2上对64位x 64位 - > 128位加宽乘法进行矢量化吗？

解决方案1
3 已采纳 2014-07-04 13:37:13