I'm trying to understand _mm256_permute2x128_si256. Is all 256 bits of register a read into the case first then is the 256 bits of register b read into the case after? Or is every 32 bits read in interleaved between vector a and vector b? So which 32 bits of which vector is read in corresponding to which bit in imm8 in what order and how? Thanks!
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
Please see this website, it's more informative than Intel's documentation:
It's a shuffle that selects two 128-bit lanes from the 4 total lanes of 2 input vectors.
The control integer operand has two 2-bit fields that each index one of 4 lanes. You could look at it as concatenating both input vectors and then indexing into that 4-lane array.
Or if the high bit of the index nibble is set, it zeros that lane of the result.
There's nothing involving 32-bit granularity. The pseudo-code from the intrinsics guide defines a helper function, and passes all 256 bits of each input to that helper function twice. All the [hi:lo]
ranges are in bits, not bytes.
Intel's asm documentation for the corresponding instructions ( vperm2i128
) has more comprehensible pseudo-code that separates the zeroing:
CASE IMM8[1:0] of
0: DEST[127:0]←SRC1[127:0]
1: DEST[127:0]←SRC1[255:128]
2: DEST[127:0]←SRC2[127:0]
3: DEST[127:0]←SRC2[255:128]
ESAC
CASE IMM8[5:4] of
0: DEST[255:128]←SRC1[127:0]
1: DEST[255:128]←SRC1[255:128]
2: DEST[255:128]←SRC2[127:0]
3: DEST[255:128]←SRC2[255:128]
ESAC
IF (imm8[3])
DEST[127:0] ← 0
FI
IF (imm8[7])
DEST[255:128] ← 0
FI
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.