简体   繁体   中英

Does Visual Studio C/C++ support automatic software prefetching?

Compilers like Intel and IBM's xlc can insert data prefetch instructions automatically.

I have some code that could be helped with prefetching at the cost of readability. That is, there's a natural grouping of code like

void foo(...){ // foo gets called frequently
...
char *myPtr = allocate(medium_size);
memset(myPtr,0,medium_size)  // cache misses here. medium_size is ~ 1 cache line 
                       // Miss occurs on first access by memset, but not enough
                       // data to ameliorate by any hardware prefetching 
                       // triggered by memset.  Basically foo() is called a lot

the cost of the cache misses incurred by the memset could be mitigated by pushing the allocate further up in the procedure and issuing a prefetch instruction immediately after, with enough instructions between it and the memset for there to be time for the data to be brought into cache. In my case the code to calculate medium_size gets a little messy when pushed up further in the procedure, making it less readable.

If the compiler could reschedule the code for me to make the prefetch worthwhile (perhaps with the support of PGO ) then I could get the best of both worlds.

So far it appears that Visual Studio supports only the intrinsics ie manual placing of prefetch instructions. Am I wrong?


Clarification update in response to questions:

Q:How might the compiler improve the above code? A: the above code was just to give a flavour of what's involved. The actual code is more complex, but boils down to an allocation and a store into it. The reading is done by the memset as it writes to the memory. On some architectures this may not trigger a cache miss, but on x86 it apparently does (according to vTune) (answered by markgz below).

Q: Won't just using memset be sufficient? memset's memory access pattern is highly predictable and the hardware prefetching mechanism should handle it. A: Yes, in general this is true and I did a poor job of explaining more of the context. The routine (foo) containing the memset gets called very frequently, and it's the first memory access by the memset that triggers the cache miss. There is not enough data for memset to ameliorate this miss through prefetching, so I need prefetching involved before calling the memset.

yes, you can use

void _mm_prefetch(char*,int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM