反向读写锁定

Question

Usually we use ReadWriteLocks with read locks while reading, and write locks while writing. 通常我们在读取时使用具有读锁定的ReadWriteLocks，并在写入时写入锁定。 But a fancy case in which I thought using in reverse can help. 但是我认为反向使用的一个奇特案例可以提供帮助。 But hopefully you guys can tell me a better way. 但希望你们能告诉我一个更好的方法。

Here is what I want. 这就是我想要的。 There will be lot of writes, but sparingly low amount of read. 会有很多写入，但读取量很少。 Example is an average calculator of latency of requests, say for example. 例如，例如，请求延迟的平均计算器。

Treat almost as pseudo code. 几乎视为伪代码。

metric.addValue(latency); // Called a lot.

metric.getAverage(); // Called sparingly.

We can do the following: 我们可以做到以下几点：

addValue(value) {
  atomicCount.increment();
  atomicSum.increment(value);
}

getAverage() {
  return atomicCount.get() != 0 ? atomicSum.get() / atomicCount.get() : 0.0;
}

The problem is in getAverage(), we "may" count a few extra counts. 问题出在getAverage（）中，我们“可能”计算一些额外的计数。 But most often probably correct values, and sometimes one extra count. 但大多数情况下可能是正确的值，有时是一个额外的计数。 But I just want it more precise. 但我只是想让它更精确。

Here is the trick: 这是诀窍：

ReadWriteLock rw = /* write preference, or a fair lock. */;
Lock read = rw.readLock();
Lock write = rw.writeLock();

addValue(value) {
  read.lock(); // Using read lock when mutating. 
  try { 
    atomicCount.increment();
    atomicSum.increment(value);
  } finally {
    read.unlock();
  }
}

getAverage() {
  write.lock(); // Using write lock when reading.
  try {
    return atomicCount.get() != 0 ? atomicSum.get() / atomicCount.get() : 0.0;
  } finally {
    write.unlock();
  }
}

My question is, can I do better? 我的问题是，我能做得更好吗？

Salt: I know about (cast) issues, and calling count.get() multiple times etc can be avoided for better performance, but didn't want to clutter the code too much. Salt：我知道（强制转换）问题，并且可以避免多次调用count.get（）以获得更好的性能，但是不想过多地混淆代码。

Answer 1

There's really no point for concurrent atomic increments; 并发原子增量确实没有意义; they can't be concurrent anyway. 他们无论如何都不能并发。

The simplest solution - a simple lock, ordinary count/sum variables - will perform much better 最简单的解决方案 - 简单的锁定，普通计数/求和变量 - 将表现得更好

lock
    count++;
    sum += value;
unlock

To be more parallel, we need "sharding" - each thread maintains its own stats; 为了更加平行，我们需要“分片” - 每个线程都保持自己的统计数据; the reader queries them all for the whole picture. 读者查询全部图片。 (the per-thread stats need to be volatile; reader uses Michael Burr's method to retrieve a stable version of the per-thread stats) （每个线程的统计数据需要是易变的;读者使用Michael Burr的方法来检索每个线程统计数据的稳定版本）

Answer 2

You might want to see if a technique like the following performs better. 您可能想要查看以下技术是否表现更好。 Basically it ensures that the count and sum are 'stable' by adding another counter that tracks the first but is only updated after all other values have completed being updated, so no locks are involved: 基本上，它通过添加另一个跟踪第一个计数器的计数器确保计数和总和是“稳定的”，但只有在所有其他值完成更新后才更新，因此不涉及锁定：

addValue(value) {

  while (atomicFlag.get() != 0) {
      // spin
  }
  atomicCount.increment();
  atomicSum.increment(value);
  atomicCount2.increment();
}

getAverage() {
    int count;
    int sum;
    int count2;

    atomicFlag.increment();
    do {
        count = atomicCount.get();
        sum = atomicSum.get();
        count2 = atomicCount2.get();
    } while (count != count2);
    atomicFlag.decrement();

    return count != 0 ? (sum * 1.0) / count : 0.0;
}

Answer 3

(copying discussion from G+ here). （在这里复制G +的讨论）。

One optimization idea is to use AtomicLong for storing both the value and count at different location of Long, by which we solve the issue of making sure the count and value matches while computing average. 一个优化的想法是使用AtomicLong在Long的不同位置存储值和计数，通过它我们解决了计算和计算平均值时计数和值匹配的问题。

Another (bigger) optimization is to use thread specific metric (as irreputable suggested earlier). 另一个（更大的）优化是使用线程特定度量（前面提到的无法建议）。 It has the following advantages. 它具有以下优点。

It avoids any sort of contention while writes. 它避免了写入时的任何争用。 So CAS on writes would be fast as no other threads are writing to the same metric. 因此，写入时CAS很快，因为没有其他线程写入相同的度量标准。
Read does not require any locks. 读取不需要任何锁定。
And most importantly, it would make better use of L1 cache. 最重要的是，它可以更好地利用L1缓存。

Explanation for the last point: 最后一点的解释：

When there are multiple threads doing lots of writes & reads from a single shared memory, in a multi-core CPU, thread running in different core would just keep invaliding other cores L1 cache. 当有多个线程从单个共享内存执行大量写入和读取时，在多核CPU中，在不同内核中运行的线程将保持其他内核L1缓存的无效。 And because of this, the latest value will have to be fetched from other core using cache consistency protocol. 因此，必须使用缓存一致性协议从其他核心获取最新值。 All this slows down things drastically. 所有这些都大大减缓了事情的发展。 Having thread specific metric avoids this issue. 具有线程特定度量可避免此问题。

Reference: http://www.cs.washington.edu/education/courses/cse378/07au/lectures/L25-Atomic-Operations.pdf 参考文献： http ： //www.cs.washington.edu/education/courses/cse378/07au/lectures/L25-Atomic-Operations.pdf

With that in mind a code like this would perform well. 考虑到这一点，这样的代码表现良好。

private final AtomicLongMap<Long> metric = AtomicLongMap.create();

public void addValue(long value) {
    long threadId = Thread.currentThread().getId();
    metric.addAndGet(threadId, (value << 32) + 1);
}

public synchronized double getAverage() {
    long value = metric.sum();
    int count = (int)value;
    return (count == 0) ? 0 : ((double)(value >> 32))/count;
}

And indeed, the tests show that it performs best - better than the above no lock solution! 事实上，测试表明它表现最佳 - 比上述无锁解决方案更好！ And by orders of magnitude too. 并且也是数量级的。

No thread safety: 3435ms, Average: 1.3532233016178474
(irreputable) Just synchronized {}  4665ms, Average: 4.0
(atuls) reverse read-write lock:    19703ms, Average: 4.0
(michael burr)  17150ms, Average: 4.0
(therealsachin) 1106ms, Average: 4.0

Answer 4

In terms of correctness I think your scheme is quite a cunning plan. 在正确性方面，我认为你的计划是一个非常狡猾的计划。 You've set things up so that multiple updating threads increment counts and totals independently and hence can safly be allowed past the read lock. 您已经进行了设置，以便多个更新线程可以独立地增加计数和总计，因此可以安全地通过读锁定。

Your average calculation takes place under a write lock and hence guarantees that no updating "readers" can be active putting the count and total temporarily out of step. 您的平均计算是在写锁定下进行的，因此可以保证不会更新“读取器”，使计数和总计暂时失步。

The big question for me is whether your scheme really gives better performance that the simple synchronized behaviour? 对我来说最大的问题是你的方案是否真的能提供简单同步行为的更好性能？ Although you've removed the superficial contention point between readers by avoid a synchronized section in your code, under the covers the reader/writer code will probably be doing some clever stuff in synchronized blocks. 虽然你已经通过避免代码中的同步部分删除了读者之间的表面争用点，但是在读者/编写者代码中，读者/编写者代码可能会在同步块中做一些聪明的事情。 see ReadWrite Lock documentation . 请参阅ReadWrite Lock文档。 Which also warns that depending on details of implementation your writer might suffer from starvation. 这也警告说，根据实施的细节，你的作家可能会遭受饥饿。

Only careful measurement can tell us the answer to that. 只有仔细的测量才能告诉我们答案。

Answer 5

I ran a benchmark for each of the solutions including my own. 我为每个解决方案运行了一个基准测试，包括我自己的。

only addValue from 100 threads, looping with 100 tasks each, looping, with 10000 updates in each task with values 0 to 9999. The results are: 只有来自100个线程的addValue，每个循环包含100个任务，循环，每个任务有10000个更新，值为0到9999.结果是：

(irreputable) Just synchronized {}: 7756 ms  Average: 4999.5
(atuls) My reverse read-write lock: 16523 ms Average: 4999.5
(michael burr) Double counter trick: 10698 Average: 4999.5
No thread safety: 4115 ms Average: 4685.0
(atuls) Not thread safe v1. 11189 ms Average: 4999.5

Looks like irreputable is correct :) 看起来无可争辩是正确的:)

反向读写锁定

问题描述

5 个解决方案

解决方案1
3 2012-10-23 04:06:48

解决方案2
2 2012-10-23 03:58:25

解决方案3
2 2012-10-25 17:53:41

解决方案4
1 2012-10-23 02:52:44

解决方案5
1 2012-10-23 11:04:32

反向读写锁定

问题描述

5 个解决方案

解决方案1 3 2012-10-23 04:06:48

解决方案2 2 2012-10-23 03:58:25

解决方案3 2 2012-10-25 17:53:41

解决方案4 1 2012-10-23 02:52:44

解决方案5 1 2012-10-23 11:04:32

解决方案1
3 2012-10-23 04:06:48

解决方案2
2 2012-10-23 03:58:25

解决方案3
2 2012-10-25 17:53:41

解决方案4
1 2012-10-23 02:52:44

解决方案5
1 2012-10-23 11:04:32