简体   繁体   中英

Trying to understand Intel Intrinsics Guide explanation for _mm256_permute2x128_si256

I'm trying to understand _mm256_permute2x128_si256. Is all 256 bits of register a read into the case first then is the 256 bits of register b read into the case after? Or is every 32 bits read in interleaved between vector a and vector b? So which 32 bits of which vector is read in corresponding to which bit in imm8 in what order and how? Thanks!

DEFINE SELECT4(src1, src2, control) {
    CASE(control[1:0]) OF
    0:  tmp[127:0] := src1[127:0]
    1:  tmp[127:0] := src1[255:128]
    2:  tmp[127:0] := src2[127:0]
    3:  tmp[127:0] := src2[255:128]
    ESAC
    IF control[3]
        tmp[127:0] := 0
    FI
    RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0

Please see this website, it's more informative than Intel's documentation:

https://www.felixcloutier.com/x86/vperm2i128

It's a shuffle that selects two 128-bit lanes from the 4 total lanes of 2 input vectors.
The control integer operand has two 2-bit fields that each index one of 4 lanes. You could look at it as concatenating both input vectors and then indexing into that 4-lane array.

Or if the high bit of the index nibble is set, it zeros that lane of the result.

There's nothing involving 32-bit granularity. The pseudo-code from the intrinsics guide defines a helper function, and passes all 256 bits of each input to that helper function twice. All the [hi:lo] ranges are in bits, not bytes.

Intel's asm documentation for the corresponding instructions ( vperm2i128 ) has more comprehensible pseudo-code that separates the zeroing:

CASE IMM8[1:0] of
    0: DEST[127:0]←SRC1[127:0]
    1: DEST[127:0]←SRC1[255:128]
    2: DEST[127:0]←SRC2[127:0]
    3: DEST[127:0]←SRC2[255:128]
ESAC

CASE IMM8[5:4] of
    0: DEST[255:128]←SRC1[127:0]
    1: DEST[255:128]←SRC1[255:128]
    2: DEST[255:128]←SRC2[127:0]
    3: DEST[255:128]←SRC2[255:128]
ESAC

IF (imm8[3])
    DEST[127:0] ← 0
FI
IF (imm8[7])
    DEST[255:128] ← 0
FI

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM