Cache locking pattern with C++

Question

I am wondering if there's a better locking scheme for a cache than a simple lock:

Mutex lock;
get(key) {
  LockGuard(lock);

  if (cache.has(key)) {
    return cache[key];
  } else {
    data = remoteclient.getslow();
    cache[key] = data;
    return data;
  }
}

Assuming you have a lot of the same requests then you're serializing the access to get() every time. Can something smarter be done with ReadWriter locks?

ie what if you do something like:

ReadersWritersLock lock;
get(key) {
  {
    ReadLockGuard(lock);
    if (cache.has(key)) {
      return cache[key];
    } 
  }
  WriteLockGuard(lock);
  data = remoteclient.getslow();
  cache[key] = data;
  return data;
 }
}

Now this will allow multiple users to get() at the same time in the case of a cache hit. However, if two users get to the first get() around the same time it's possible that they will both try to go into the second part of the code to get the data. Does that seem like a good idea?

Any other ideas for optimizing this sort of code?

Answer 1

One thing that I don't like about the posted code is that in both snippets, the call

remoteclient.getslow();

is called while the cache is locked. If remoteclient.getslow() is in fact likely to take a long time to return (as the name indicates), then any other threads trying to access the cache will end up getting blocked for a long time (ie until getslow() returns and the thread that was calling it releases the lock) ... even if they are interested only in unrelated data that is already present in the cache!

To avoid that I would call remoteclient.getslow() outside the LockGuard's scope instead (ie while the cache is unlocked). Then, after remoteclient.getslow() returns the result, I would re-lock the cache and update the cache with the retrieved value. That way the cache is never locked for extended periods.

(Of course doing it that way does open up the possibility of multiple threads calling remoteclient.getslow() for the same data item, if they all decide they need the same data at around the same time... but that may be an acceptable side effect. Or if not, you could design a mechanism to indicate that a particular cache value is in the process of being retrieved and have the other threads block until the retrieval has completed... if that is worth the extra complexity for you. That would probably require condition variables and the like to do properly)

Answer 2

Your pseudocode has the right idea, but it has a race condition.

As soon as ReadLockGuard goes out of scope, you lose the lock, which means the data structure can be modified by another thread before the WriteLockGuard has time to grab the lock.

If your readers/writer lock implementation supports upgradable locks, then you should use that. Otherwise, after grabbing the lock for writing, you need to double-check the cache in case it got populated between the "reader" release and the "writer" acquisition.

Answer 3

It's possible that two threads will enter the 'writing' part of get(), but probably very unlikely. If you're concerned about the penalty of an extra getslow() call, you can check again inside of the writer lock.

ReadersWritersLock lock;
get(key) {
  {
    ReadLockGuard(lock);
    if (cache.has(key)) {
      return cache[key];
    } 
  }
  WriteLockGuard(lock);
  if (cache.has(key) == false) {
    data = remoteclient.getslow();
    cache[key] = data;
    return data;
  }
 }

Cache locking pattern with C++

Question

3 answers

solution1
2 2011-10-14 23:23:57

solution2
1 2011-10-14 23:01:55

solution3
1 2011-10-14 23:04:07

Cache locking pattern with C++

Question

3 answers

solution1 2 2011-10-14 23:23:57

solution2 1 2011-10-14 23:01:55

solution3 1 2011-10-14 23:04:07

solution1
2 2011-10-14 23:23:57

solution2
1 2011-10-14 23:01:55

solution3
1 2011-10-14 23:04:07