[英]C/C++ intrinsics for non-temporal loads of 32- and 64-bit values on x86_64?
Are there C/C++ intrinsics for non-temporal loads (ie loads without caching, directly from DRAM) of 32- and 64-bit values on x86_64? 在x86_64上是否存在C / C ++内部函数用于32位和64位值的非临时负载(即,不从DRAM直接缓存的负载)?
My compiler is MSVC++2017 toolset v141. 我的编译器是MSVC ++ 2017工具集v141。 But intrinsics for other compilers are welcome, as well as references to the underlying assembly instructions. 但是,欢迎使用其他编译器的内在函数,以及对底层汇编指令的引用。
At the time of writing (August 2017) there are no non-temporal loads to GP registers . 在撰写本文时(2017年8月) ,GP寄存器没有任何非临时性加载 。
The only available non-temporal instructions are: 唯一可用的非时间性指令是:
Integer domain 整数域
(v)movntdqa
(load) despite the name this instruction moves 128/256/512 bits, aligned on their natural boundary, intoxmm/ymm/zmm
registers respectively.(v)movntdqa
(加载),尽管此指令的名称为,将在其自然边界上对齐的128/256/512位分别移到xmm/ymm/zmm
寄存器中。
(v)movntdq
(store) despite the name this instruction movesxmm/ymm/zmm
registers into a 128/256/512 bits, aligned on their natural boundary, memory location.(v)movntdq
(存储),尽管此指令的名称为xmm/ymm/zmm
寄存器,但将其移至128/256/512位,并按其自然边界,存储位置对齐。
GP registers GP寄存器
movnti
(store) store a 32/64-bit GP register into a DWORD/QWORD in memory.movnti
(存储)将32/64位GP寄存器存储到内存中的DWORD / QWORD中。
MMX registers MMX寄存器
movntq
(store) store an MMX register into a QWORD in memory. movntq
(存储)将MMX寄存器存储到内存中的QWORD中。
Floating point domain 浮点域
(v)movntpd/s
(store) (legacy and VEX encoded) store axmm/ymm/zmm
register into an aligned 128/256/512 bits memory location.(v)movntpd/s
(存储) (旧式和VEX编码)将xmm/ymm/zmm
寄存器存储到对齐的128/256/512位存储器位置。 Likemovntdq
but in the FP domain. 类似于movntdq
但在FP域中。
(v)movntpd/s
(store) (EVEX encoded) store axmm/ymm/zmm
register into an aligned 512 bits memory location clearing the upper unused bits.(v)movntpd/s
(存储) (EVEX编码)将xmm/ymm/zmm
寄存器存储到对齐的512位存储器位置,清除未使用的高位。 Likemovntdq
but in the FP domain. 类似于movntdq
但在FP域中。
Intel manuals are contradictory on this 英特尔手册与此矛盾
Masked movs 蒙面电影
(v)maskmovdqu
(store) stores the bytes of anxmm
register according to the mask in anotherxmm
register.(v)maskmovdqu
(存储)根据掩码将xmm
寄存器的字节存储在另一个xmm
寄存器中。
(v)maskmovq
(store) stores the bytes of an MMX register according to the mask in another MMX register.(v)maskmovq
(存储)根据掩码将MMX寄存器的字节存储在另一个MMX寄存器中。
Take a look here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal 在这里看看: https : //software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal
void _mm_stream_pi (__m64* mem_addr, __m64 a)
void _mm_stream_si32 (int* mem_addr, int a)
and some others 和其他一些
and 和
https://msdn.microsoft.com/en-us/library/hh977023.aspx https://msdn.microsoft.com/zh-CN/library/hh977023.aspx
it is actually VS2015 documentation but the VS2017 one (at least for me) is strange, disorganised and I cant find anything there :). 它实际上是VS2015文档,但是VS2017一个(至少对我来说)很奇怪,杂乱无章,我在那找不到任何东西:)。
for this at least as I know 为此,至少我知道
void _mm_prefetch (char const* p, int i) is used for it.
those loads are short enough to only inform the uP to do not evict other data from the cache without the performance penalty (so even for non-temporal load if the there is a room in the cache it will be cached, but it will not evict any data) 这些负载足够短,仅通知uP不会从缓存中退出其他数据,而不会降低性能(因此,即使对于非临时负载,如果缓存中有空间也将被缓存,但不会退出)任何数据)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.