简体   繁体   中英

More TLB misses when process memory size larger?

I have my program which I have written in C++. On linux the process is allocated a certain amount of memory. Part is the Stack, part the Heap, part Text and part BSS.

Is the following true:

The larger the amount of memory allocated to the Heap component of my process- the chance of Translation Lookaside Buffer misses increases?

And generally speaking- the more memory my application process consumes, the greater the chance of TLB misses?

I think there is no direct relationship between the amount of memory allocated and the miss rate of TLB. As far as I know, as long as your program has good locality, the TLB misses will remain low.

There is several reasons that would lead to high TLB miss: 1.Not enough memory and to many running process; 2.Low locality of your program. 3.the inefficient way you visit array elements in cycles in your codes.

Programs are usually divided into phases that exhibit completely different memory and execution characteristics - your code may allocate a huge chunk of memory at some point, then be off doing some other unrelated computations. In that case, your TLBs (that are basically just caches for address translation) would age away the unused pages and eventually drop them. While you're not using these pages, you shouldn't care about that.

The real question is - when you get to some performance critical phase, are you going to work with more pages than your TLBs can sustain simultaneously? On one hand modern CPUs have large TLB, often with 2 levels of caching - the L2 TLB of a modern intel CPU should have (IIRC) 512 entries - that's 2M worth of data if you're using 4k pages (with large pages that would have been more, but TLBs usually don't like to work with them due to potential conflicts with smaller pages..).

It's quite possible for an application to work with more than 2M of data, but you should avoid doing this at the same time if possible - either by doing cache tiling or changing the algorithms. That's not always possible (for eg when streaming from memory or from IO), but then the TLB misses are probably not your main bottlenecks. When working with the same set of data and accessing the same elements multiple times - you should always attempt to keep them cached as close as possible.

It's also possible to use software prefetches to make the CPU perform the TLB misses (and following page walks) earlier in time, preventing them from blocking your progress. On some CPUs hardware prefetches are already doing this for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM