简体   繁体   中英

Why have AMD-CPUs such a silly PAUSE-timing

I've developed a monitor-object like that of Java for C++ with some improvements. The major improvement is that there's not only a spin-loop for locking and unlocking but also for waiting on an event. In this case you don't have to lock the mutex but supply a predicate on a wait_poll-function and the code repeatedly tries to lock the mutex polling and if it can lock the mutex it calls the predicate which returns (or moves) a pair of a bool and the result-type.

Waiting to for a semaphore and or a event-object (Win32) in the kernel can easily take from 1.000 to 10.000 clock-cylces even when the call immediately returns because the semaphore or event has been set before. So there has to be a spin count with a reasonable relationship to this waiting-inteval, fe spinning one tenth of the minimum interval being spent in the kernel.

With my monitor-object I've taken the spincount recalculation-algorithm from the glibc. And I'm also using the PAUSE-instruction. But I found that on my CPU (TR 3900X) the pause instruction is too fast. It's about 0,78ns on average. On Intel-CPUs its much more reasonable with about 30ns.

This is the code:

#include <iostream>
#include <chrono>
#include <cstddef>
#include <cstdint>
#include <immintrin.h>

using namespace std;
using namespace chrono;

int main( int argc, char **argv )
{
    static uint64_t const PAUSE_ROUNDS = 1'000'000'000;
    auto start = high_resolution_clock::now();
    for( uint64_t i = PAUSE_ROUNDS; i; --i )
        _mm_pause();
    double ns = (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)PAUSE_ROUNDS;
    cout << ns << endl;
}

Why has AMD taken such a silly PAUSE-timing? PAUSE is for spin-wait-loops and should closely match the time it takes for a cacheline-content to flip to a different core and back.

But I found that on my CPU (TR 3900X) the pause instruction is too fast. It's about 0,78ns on average. On Intel-CPUs its much more reasonable with about 30ns.

The pause instruction has never had anything to do with time and is not intended to be used as a time delay.

What pause is for is to prevent the CPU from wasting its resources (speculatively) executing many iterations of a loop in parallel; which is especially useful in hyper-threading situations where a different logical processor in the core can use those resources, but also useful to improve the time it takes to exit the loop when the condition changes (because you don't have "N iterations" of instructions queued up from before the condition changed).

Given this; for an extremely complex CPU that might have 200 instruction in flight at the same time, pause itself might happen instantly but cause a "200 cycle long" pipeline bubble in its wake; and for an extremely simple CPU ("in order" with no speculative execution) pause may/should do literally nothing (treated as a nop ).

PAUSE is for spin-wait-loops and should closely match the time it takes for a cacheline-content to flip to a different core and back.

No. Assume the cache line is in the "modified" state in a different CPU's cache and the instruction after the pause is something like " cmp [lock],0 " that causes the CPU to try to put the cache line into the "shared" state. How long should the CPU waste time doing nothing for no reason after the pause but before trying to put the cache line into the "shared" state?

Note: If you actually need a tiny time delay, then you'd want to look at the umwait instruction. You don't need a time delay though - you want a time-out (eg "spin with pause ; until rdtsc says an certain amount of time has passed). For this I'd be tempted to break it into an inner loop that does " pause and check for condition N times" then an outer loop that does "retry inner loop if time limit not expired yet".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM