Reverse read write lock

Question

Usually we use ReadWriteLocks with read locks while reading, and write locks while writing. But a fancy case in which I thought using in reverse can help. But hopefully you guys can tell me a better way.

Here is what I want. There will be lot of writes, but sparingly low amount of read. Example is an average calculator of latency of requests, say for example.

Treat almost as pseudo code.

metric.addValue(latency); // Called a lot.

metric.getAverage(); // Called sparingly.

We can do the following:

addValue(value) {
  atomicCount.increment();
  atomicSum.increment(value);
}

getAverage() {
  return atomicCount.get() != 0 ? atomicSum.get() / atomicCount.get() : 0.0;
}

The problem is in getAverage(), we "may" count a few extra counts. But most often probably correct values, and sometimes one extra count. But I just want it more precise.

Here is the trick:

ReadWriteLock rw = /* write preference, or a fair lock. */;
Lock read = rw.readLock();
Lock write = rw.writeLock();

addValue(value) {
  read.lock(); // Using read lock when mutating. 
  try { 
    atomicCount.increment();
    atomicSum.increment(value);
  } finally {
    read.unlock();
  }
}

getAverage() {
  write.lock(); // Using write lock when reading.
  try {
    return atomicCount.get() != 0 ? atomicSum.get() / atomicCount.get() : 0.0;
  } finally {
    write.unlock();
  }
}

My question is, can I do better?

Salt: I know about (cast) issues, and calling count.get() multiple times etc can be avoided for better performance, but didn't want to clutter the code too much.

Answer 1

There's really no point for concurrent atomic increments; they can't be concurrent anyway.

The simplest solution - a simple lock, ordinary count/sum variables - will perform much better

lock
    count++;
    sum += value;
unlock

To be more parallel, we need "sharding" - each thread maintains its own stats; the reader queries them all for the whole picture. (the per-thread stats need to be volatile; reader uses Michael Burr's method to retrieve a stable version of the per-thread stats)

Answer 2

You might want to see if a technique like the following performs better. Basically it ensures that the count and sum are 'stable' by adding another counter that tracks the first but is only updated after all other values have completed being updated, so no locks are involved:

addValue(value) {

  while (atomicFlag.get() != 0) {
      // spin
  }
  atomicCount.increment();
  atomicSum.increment(value);
  atomicCount2.increment();
}

getAverage() {
    int count;
    int sum;
    int count2;

    atomicFlag.increment();
    do {
        count = atomicCount.get();
        sum = atomicSum.get();
        count2 = atomicCount2.get();
    } while (count != count2);
    atomicFlag.decrement();

    return count != 0 ? (sum * 1.0) / count : 0.0;
}

Answer 3

(copying discussion from G+ here).

One optimization idea is to use AtomicLong for storing both the value and count at different location of Long, by which we solve the issue of making sure the count and value matches while computing average.

Another (bigger) optimization is to use thread specific metric (as irreputable suggested earlier). It has the following advantages.

It avoids any sort of contention while writes. So CAS on writes would be fast as no other threads are writing to the same metric.
Read does not require any locks.
And most importantly, it would make better use of L1 cache.

Explanation for the last point:

When there are multiple threads doing lots of writes & reads from a single shared memory, in a multi-core CPU, thread running in different core would just keep invaliding other cores L1 cache. And because of this, the latest value will have to be fetched from other core using cache consistency protocol. All this slows down things drastically. Having thread specific metric avoids this issue.

Reference: http://www.cs.washington.edu/education/courses/cse378/07au/lectures/L25-Atomic-Operations.pdf

With that in mind a code like this would perform well.

private final AtomicLongMap<Long> metric = AtomicLongMap.create();

public void addValue(long value) {
    long threadId = Thread.currentThread().getId();
    metric.addAndGet(threadId, (value << 32) + 1);
}

public synchronized double getAverage() {
    long value = metric.sum();
    int count = (int)value;
    return (count == 0) ? 0 : ((double)(value >> 32))/count;
}

And indeed, the tests show that it performs best - better than the above no lock solution! And by orders of magnitude too.

No thread safety: 3435ms, Average: 1.3532233016178474
(irreputable) Just synchronized {}  4665ms, Average: 4.0
(atuls) reverse read-write lock:    19703ms, Average: 4.0
(michael burr)  17150ms, Average: 4.0
(therealsachin) 1106ms, Average: 4.0

Answer 4

In terms of correctness I think your scheme is quite a cunning plan. You've set things up so that multiple updating threads increment counts and totals independently and hence can safly be allowed past the read lock.

Your average calculation takes place under a write lock and hence guarantees that no updating "readers" can be active putting the count and total temporarily out of step.

The big question for me is whether your scheme really gives better performance that the simple synchronized behaviour? Although you've removed the superficial contention point between readers by avoid a synchronized section in your code, under the covers the reader/writer code will probably be doing some clever stuff in synchronized blocks. see ReadWrite Lock documentation . Which also warns that depending on details of implementation your writer might suffer from starvation.

Only careful measurement can tell us the answer to that.

Answer 5

I ran a benchmark for each of the solutions including my own.

only addValue from 100 threads, looping with 100 tasks each, looping, with 10000 updates in each task with values 0 to 9999. The results are:

(irreputable) Just synchronized {}: 7756 ms  Average: 4999.5
(atuls) My reverse read-write lock: 16523 ms Average: 4999.5
(michael burr) Double counter trick: 10698 Average: 4999.5
No thread safety: 4115 ms Average: 4685.0
(atuls) Not thread safe v1. 11189 ms Average: 4999.5

Looks like irreputable is correct :)

Reverse read write lock

Question

5 answers

solution1
3 2012-10-23 04:06:48

solution2
2 2012-10-23 03:58:25

solution3
2 2012-10-25 17:53:41

solution4
1 2012-10-23 02:52:44

solution5
1 2012-10-23 11:04:32

Reverse read write lock

Question

5 answers

solution1 3 2012-10-23 04:06:48

solution2 2 2012-10-23 03:58:25

solution3 2 2012-10-25 17:53:41

solution4 1 2012-10-23 02:52:44

solution5 1 2012-10-23 11:04:32

solution1
3 2012-10-23 04:06:48

solution2
2 2012-10-23 03:58:25

solution3
2 2012-10-25 17:53:41

solution4
1 2012-10-23 02:52:44

solution5
1 2012-10-23 11:04:32