简体   繁体   English

HashMap vs ConcurrentHashMap:线程间传输

[英]HashMap vs ConcurrentHashMap: transfer between threads

I have a question about using maps in multithreaded application.我有一个关于在多线程应用程序中使用地图的问题。 Suppose we have such scenario:假设我们有这样的场景:

  1. Thread receives json data as List<Map<String, Object>> which is deserialized by Jackson Json.线程以List<Map<String, Object>>形式接收 json 数据List<Map<String, Object>>该数据由 Jackson Json 反序列化。
  2. This thread modifies received maps.此线程修改收到的地图。
  3. And then puts list into blocking queue to be consumed by another thread.然后将列表放入阻塞队列以供另一个线程使用。

As you can see, map is modified only by single thread, but then it "becomes" read-only (nothing chagnes, just not modified anymore) and passed to another thread.如您所见,地图仅由单线程修改,但随后它“变为”只读(没有任何变化,只是不再修改)并传递给另一个线程。 Next, when I looked into implementations of HasMap (also TreeMap ) and ConcurrentHashMap , the latter has volatile fields while first two isn't.接下来,当我研究HasMap (也是TreeMap )和ConcurrentHashMap ,后者具有volatile字段,而前两个没有。 So, which implementation of Map should I use in this case?那么,在这种情况下我应该使用Map哪个实现? Does ConcurrentHashMap is overkill choice or it must be used due to inter-thread transfer? ConcurrentHashMap是过度选择还是由于线程间传输而必须使用它?

My simple tests shows that I can use HashMap/TreeMap when they are modified synchronously and it works, but my conclusion or my test code may be wrong:我的简单测试表明,我可以在同步修改HashMap/TreeMap时使用它们并且可以正常工作,但是我的结论或我的测试代码可能是错误的:

def map = new TreeMap() // or HashMap
def start = new CountDownLatch(1)
def threads = (1..5)
println("Threads: " + threads)
def created = new CountDownLatch(threads.size())
def completed = new CountDownLatch(threads.size())
threads.each {i ->
    new Thread({
        def from = i * 10
        def to = from + 10
        def local = (from..to)
        println(Thread.currentThread().name + " " + local)
        created.countDown()
        start.await()
        println('Mutating by ' + local)
        local.each {number ->
            synchronized (map) {
                map.put(number, ThreadLocalRandom.current().nextInt())
            }
            println(Thread.currentThread().name + ' added ' + number +  ': ' + map.keySet())
        }
        println 'Done: ' + Thread.currentThread().name
        completed.countDown()
    }).start()
}

created.await()
start.countDown()
completed.await()
println('Completed:')
map.each { e ->
    println('' + e.key + ': ' + e.value)
}

Main thread spawns 5 child threads which updates common map synchronously, when they complete main thread successfully sees all updates by child threads.主线程产生 5 个同步更新公共映射的子线程,当它们完成主线程成功看到子线程的所有更新。

The java.util.concurrent classes have special guarantees regarding sequencing: java.util.concurrent类对排序有特殊的保证

Memory consistency effects: As with other concurrent collections, actions in a thread prior to placing an object into a BlockingQueue happen-before actions subsequent to the access or removal of that element from the BlockingQueue in another thread.内存一致性影响:与其他并发集合一样,在将对象放入BlockingQueue之前线程中的操作发生在另一个线程中从BlockingQueue中访问或删除该元素之后的操作之前

This means that you are free to use any kind of mutable object and manipulate it as you wish, then put it into the queue.这意味着您可以自由使用任何类型的可变对象并根据需要对其进行操作,然后将其放入队列中。 When it's retrieved, all of the manipulations you've applied will be visible.检索到它时,您应用的所有操作都将可见。

(Note more generally that the kind of test you demonstrated can only prove lack of safety; in most real-world cases, unsynchronized code works fine 99% of the time. It's that last 1% that bites you.) (更一般地注意,您展示的那种测试只能证明缺乏安全性;在大多数现实世界的情况下,未同步的代码在 99% 的情况下都可以正常工作。正是最后 1% 的问题困扰着您。)

This question has a broad scope.这个问题的范围很广。

Your original scenario你的原始场景

You say :你说 :

[A] map is modified only by single thread, but then it "becomes" read-only [A] 地图仅被单线程修改,但随后“变成”只读

The tricky part is the word "then".棘手的部分是“然后”这个词。 When you, the programmer say "then", you refer to "clock time", eg i've done this, now do that.当你,程序员说“那么”时,你指的是“时钟时间”,例如我已经做了这个,现在做那个。 But for an incredibly wide variety of reasons, the computer does not "think" (execute code) this way.但是由于各种各样的原因,计算机不会以这种方式“思考”(执行代码)。 What happened before, and what happens after need to be "syncrhonized manually" for the computer to see the world the way we see it.之前发生的事情和之后发生的事情需要“手动同步”,以便计算机以我们看待世界的方式看待世界。

That's the way the Java Memory Model expresses stuff : if you want your objects to behave predictably in a concurrent environment, you have to make sure that you establish "happens before" boundaries.这就是 Java 内存模型表达内容的方式:如果您希望对象在并发环境中的行为可预测,则必须确保建立“先发生”边界。

There are a few things that establish happens before relationships in java code.在 Java 代码中建立关​​系之前,有一些事情会发生。 Simplifying a bit, and just to name a few :稍微简化一下,仅举几例:

  • the order of execution in a single thread (if statements 1 and 2 are executed by the same thread in that order, whatever 1 did is always visible by statement 2)单个线程中的执行顺序(如果语句 1 和 2 由同一线程按该顺序执行,则语句 2 始终可以看到 1 所做的任何事情)
  • When thread t1 start() s t2, everything that t1 did before starting t2 is visible by t2.当线程 t1 start() s t2 时,t1 在启动 t2 之前所做的一切都对 t2 可见。 Reciprocally with join()join()互惠
  • Same goes with synchronized , objects monitors : every action made by a thread inside a sync'd block is visible by another thread that syncs on the same instance synchronized对象监视器也是如此: synchronized块内的线程所做的每个操作都可以被在同一实例上同步的另一个线程看到
  • Same goes with any specialized methods of java.util.concurrent classes. java.util.concurrent类的任何专门方法也是如此。 eg Locks and Semaphore, of course, but also collections : if you put an element in a syncrhonized collection, the thread that pulls it out has an happen-before on the thread that put it in.例如锁和信号量,当然,还有集合:如果你把一个元素放在同步集合中,拉出它的线程在放入它的线程上有一个happen-before。
  • If T2 has an happens before with T1, and if T3 has one with T2, then T3 also have it with T1.如果 T2 之前与 T1 发生过一次,并且如果 T3 与 T2 发生过一次,那么 T3 也与 T1 发生过一次。

So back to your phrase所以回到你的短语

then it "becomes" read-only然后它“变成”只读

It does become read ony.它确实变成了只读。 But for the computer to see it, you have to give a meaning to "then";但是为了让计算机看到它,你必须赋予“那么”一个含义; which is : you have to put an happen before relationship in your code.即:您必须在代码中的happen before relationship放置一个happen before relationship

Later on you state :后来你说:

And then puts list into blocking queue然后将列表放入阻塞队列

A java.util.concurrent queue ?一个java.util.concurrent队列? How neat is that!那是多么整洁啊! It just so happens that a thread pulling out an object from a concurrent queue has a "happens before" relationship with repsect to the thread that put the said object into the queue.碰巧的是,从并发队列中拉出对象的线程与将所述对象放入队列的线程之间存在“发生在之前”的关系。

You have established the realtionship.你已经建立了关系。 All mutations made (before) by the thread that put the object into the queue are safely visible by the one that pulls it out.将对象放入队列的线程(之前)所做的所有更改对于将对象拉出的线程都是安全可见的。 You do not need a ConcurrentHashMap in this case (if no other thread mutates the same data of course).在这种情况下,您不需要ConcurrentHashMap (当然,如果没有其他线程改变相同的数据)。

Your sample code您的示例代码

Your sample code does not use a queue.您的示例代码不使用队列。 And it mutates a single map modified by multiple threads (not the other way around as your scenario mentions).并且它会改变由多个线程修改的单个地图(而不是您的场景提到的相反)。 So, it's just... not the the same.所以,这只是……不一样。 But either way, your code's fine.但无论哪种方式,您的代码都很好。

Threads accessing the map do it like so :访问地图的线程这样做:

synchronized (map) {
    map.put(number, ThreadLocalRandom.current().nextInt())
}

The synchornize provides 1) mutual exclusion of the threads and 2) a happens before. synchornize提供 1) 线程的互斥和 2) a 发生在之前。 So each thread that enters the synchonization see all that "happened before" in another thread that also syncrhonized on it (which is all of them).因此,进入同步的每个线程都可以看到另一个线程中“之前发生过”的所有内容,该线程也同步了它(这是所有线程)。

So no problem here.所以这里没有问题。

And then your main thread does :然后你的主线程做:

completed.await()
println('Completed:')
map.each { e ->
   println('' + e.key + ': ' + e.value)
}

The thing that saves you here is completed.await() .在这里拯救你的是completed.await() This establishes a happens before with every thread that called countDown() , which is all of them.这在每个调用countDown()线程之前建立了一个发生,这就是所有线程。 So your main thread sees everything that was done by the worker threads.所以你的主线程会看到工作线程所做的一切。 All is fine.一切都很好。

Except... We often forget to check to bootstrap of threads.除了......我们经常忘记检查线程的引导。 The first time a worker synchronizes on the map instance, nobody did it before.工作人员第一次在地图实例上同步时,之前没有人做过。 How come we can be sure that they see a map instance fully initialized and ready.我们怎么能确定他们看到一个地图实例完全初始化并准备好了。

Well, for two reasons :嗯,有两个原因:

  1. You initialize the map instance BEFORE calling thread.start() , which establishes an happens before.您在调用thread.start()之前初始化地图实例,这在之前建立了一个发生。 This would be enought这样就够了
  2. Inside your worker threads, you also use latches before starting the work, which then again establish a relationship.在您的工作线程中,您还可以在开始工作之前使用闩锁,然后再次建立关系。

You're doubly safe.你是双重安全的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM