简体   繁体   English

在x86_64上非临时性加载32位和64位值的C / C ++内在函数?

[英]C/C++ intrinsics for non-temporal loads of 32- and 64-bit values on x86_64?

Are there C/C++ intrinsics for non-temporal loads (ie loads without caching, directly from DRAM) of 32- and 64-bit values on x86_64? 在x86_64上是否存在C / C ++内部函数用于32位和64位值的非临时负载(即,不从DRAM直接缓存的负载)?

My compiler is MSVC++2017 toolset v141. 我的编译器是MSVC ++ 2017工具集v141。 But intrinsics for other compilers are welcome, as well as references to the underlying assembly instructions. 但是,欢迎使用其他编译器的内在函数,以及对底层汇编指令的引用。

At the time of writing (August 2017) there are no non-temporal loads to GP registers . 在撰写本文时(2017年8月) ,GP寄存器没有任何非临时性加载


The only available non-temporal instructions are: 唯一可用的非时间性指令是:

Integer domain 整数域

(v)movntdqa (load) despite the name this instruction moves 128/256/512 bits, aligned on their natural boundary, into xmm/ymm/zmm registers respectively. (v)movntdqa (加载),尽管此指令的名称为,将在其自然边界上对齐的128/256/512位分别移到xmm/ymm/zmm寄存器中。
(v)movntdq (store) despite the name this instruction moves xmm/ymm/zmm registers into a 128/256/512 bits, aligned on their natural boundary, memory location. (v)movntdq (存储),尽管此指令的名称为xmm/ymm/zmm寄存器,但将其移至128/256/512位,并按其自然边界,存储位置对齐。

GP registers GP寄存器

movnti (store) store a 32/64-bit GP register into a DWORD/QWORD in memory. movnti (存储)将32/64位GP寄存器存储到内存中的DWORD / QWORD中。

MMX registers MMX寄存器

movntq (store) store an MMX register into a QWORD in memory. movntq (存储)将MMX寄存器存储到内存中的QWORD中。

Floating point domain 浮点域

(v)movntpd/s (store) (legacy and VEX encoded) store a xmm/ymm/zmm register into an aligned 128/256/512 bits memory location. (v)movntpd/s (存储) (旧式和VEX编码)xmm/ymm/zmm寄存器存储到对齐的128/256/512位存储器位置。 Like movntdq but in the FP domain. 类似于movntdq但在FP域中。

(v)movntpd/s (store) (EVEX encoded) store a xmm/ymm/zmm register into an aligned 512 bits memory location clearing the upper unused bits. (v)movntpd/s (存储) (EVEX编码)xmm/ymm/zmm寄存器存储到对齐的512位存储器位置,清除未使用的高位。 Like movntdq but in the FP domain. 类似于movntdq但在FP域中。
Intel manuals are contradictory on this 英特尔手册与此矛盾

Masked movs 蒙面电影

(v)maskmovdqu (store) stores the bytes of an xmm register according to the mask in another xmm register. (v)maskmovdqu (存储)根据掩码将xmm寄存器的字节存储在另一个xmm寄存器中。

(v)maskmovq (store) stores the bytes of an MMX register according to the mask in another MMX register. (v)maskmovq (存储)根据掩码将MMX寄存器的字节存储在另一个MMX寄存器中。

Take a look here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal 在这里看看: https : //software.intel.com/sites/landingpage/IntrinsicsGuide/#text=temporal

void _mm_stream_pi (__m64* mem_addr, __m64 a)
void _mm_stream_si32 (int* mem_addr, int a)

and some others 和其他一些

and

https://msdn.microsoft.com/en-us/library/hh977023.aspx https://msdn.microsoft.com/zh-CN/library/hh977023.aspx

it is actually VS2015 documentation but the VS2017 one (at least for me) is strange, disorganised and I cant find anything there :). 它实际上是VS2015文档,但是VS2017一个(至少对我来说)很奇怪,杂乱无章,我在那找不到任何东西:)。

for this at least as I know 为此,至少我知道

void _mm_prefetch (char const* p, int i) is used for it. 

those loads are short enough to only inform the uP to do not evict other data from the cache without the performance penalty (so even for non-temporal load if the there is a room in the cache it will be cached, but it will not evict any data) 这些负载足够短,仅通知uP不会从缓存中退出其他数据,而不会降低性能(因此,即使对于非临时负载,如果缓存中有空间也将被缓存,但不会退出)任何数据)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM