简体   繁体   English

Java如何实现对ConcurrentHashMap读取的锁定

[英]Java How to implement lock on ConcurrentHashMap read

TL;DR: in Java I have N threads, each using a shared collection. TL;DR:在 Java 中,我有 N 个线程,每个线程使用一个共享集合。 ConcurrentHashMap allows me to lock on write, but not on read. ConcurrentHashMap 允许我锁定写入,但不能锁定读取。 What I need is to lock a specific item of the collection, read the previous data, do some computation, and update the values.我需要的是锁定集合的特定项目,读取以前的数据,进行一些计算并更新值。 If two threads receive two messages from the same sender, the second thread has to wait for the first one to finish, before doing its stuff.如果两个线程收到来自同一个发送者的两条消息,则第二个线程必须等待第一个完成,然后才能执行其操作。


Long version:长版:

These threads are receiving chronologically ordered messages, and they have to update the collection basing on a messageSenderID .这些线程接收按时间顺序排列的消息,它们必须根据messageSenderID更新集合。

My code simplified is as follow:我的代码简化如下:

public class Parent {
    private Map<String, MyObject> myObjects;

    ExecutorService executor;
    List<Future<?>> runnables = new ArrayList<Future<?>>();

    public Parent(){
        myObjects= new ConcurrentHashMap<String, MyObject>();

        executor = Executors.newFixedThreadPool(10);
        for (int i = 0; i < 10; i++) {
            WorkerThread worker = new WorkerThread("worker_" + i);
            Future<?> future = executor.submit(worker);
            runnables.add(future);
        }
    }

    private synchronized String getMessageFromSender(){
        // Get a message from the common source
    }

    private synchronized MyObject getMyObject(String id){
        MyObject myObject = myObjects.get(id);
        if (myObject == null) {
            myObject = new MyObject(id);
            myObjects.put(id, myObject);
        }
        return myObject;
    }

    private class WorkerThread implements Runnable {
        private String name;

        public WorkerThread(String name) {
            this.name = name;
        }

        @Override
        public void run() {
            while(!isStopped()) {
                JSONObject message = getMessageFromSender();
                String id = message.getString("id");
                MyObject myObject = getMyObject(id);
                synchronized (myObject) {
                    doLotOfStuff(myObject);
                }
            }
        }
    }
}

So basically I have one producer and N consumers, to speed-up processing, but the N consumers have to deal with a common base of data and chronological order has to be respected.所以基本上我有一个生产者和 N 个消费者,以加快处理速度,但 N 个消费者必须处理一个共同的数据基础,并且必须遵守时间顺序。

I am currently using a ConcurrentHashMap , but I'm willing to change it if needed.我目前正在使用ConcurrentHashMap ,但如果需要,我愿意更改它。

The code seems to work if messages with same ID arrive enough apart (> 1 second), but if I get two messages with the same ID in the distance of microseconds, I get two threads dealing with the same item in the collection.如果具有相同 ID 的消息间隔足够远(> 1 秒),代码似乎可以工作,但是如果我在几微秒的距离内收到两条具有相同 ID 的消息,我将得到两个线程处理集合中的同一项目。

I GUESS that my desired behavior is:我想要的行为是:

Thread 1                        Thread 2
--------------------------------------------------------------
read message 1
find ID
lock that ID in collection
do computation and update
                                read message 2
                                find ID
                                lock that ID in collection
                                do computation and update

While I THINK that this is what happens:虽然我认为这是发生了什么:

Thread 1                        Thread 2
--------------------------------------------------------------
read message 1
                                read message 2
                                find ID
                                lock that ID in collection
                                do computation and update
find ID
lock that ID in collection
do computation and update

I thought about doing something like我想过做类似的事情

JSONObject message = getMessageFromSender();
synchronized(message){
    String id = message.getString("id");
    MyObject myObject = getMyObject(id);
    synchronized (myObject) {
        doLotOfStuff(myObject);
    } // well maybe this inner synchronized is superfluous, at this point
}

But I think that would kill the whole purpose of having a multithreaded structure, since I would read one message at a time, and the workers are not doing anything else;但我认为这会破坏多线程结构的全部目的,因为我一次只读取一条消息,而工作人员不会做任何其他事情; and it would be like if I was using a SynchronizedHashMap instead of a ConcurrentHashMap.就像我使用 SynchronizedHashMap 而不是 ConcurrentHashMap 一样。


For the record, I report here the solution I implemented eventually.作为记录,我在此报告我最终实施的解决方案。 I'm not sure it is optimal and I still have to test for performances, but at least the input is handed properly.我不确定它是否是最佳的,我仍然需要测试性能,但至少输入是正确的。

public class Parent implements Runnable {

    private final static int NUM_WORKERS = 10;
    ExecutorService executor;
    List<Future<?>> futures = new ArrayList<Future<?>>();
    List<WorkerThread> workers = new ArrayList<WorkerThread>();

    @Override
    public void run() {
        executor = Executors.newFixedThreadPool(NUM_WORKERS);
        for (int i = 0; i < NUM_WORKERS; i++) {
            WorkerThread worker = new WorkerThread("worker_" + i);
            Future<?> future = executor.submit(worker);
            futures.add(future);
            workers.add(worker);
        }

        while(!isStopped()) {
            byte[] message = getMessageFromSender();
            byte[] id = getId(message);
            int n = Integer.valueOf(Byte.toString(id[id.length-1])) % NUM_WORKERS;
            if(n >= 0 && n <= (NUM_WORKERS-1)){
                workers.get(n).addToQueue(line);
            }
        }
    }

    private class WorkerThread implements Runnable {
        private String name;
        private Map<String, MyObject> myObjects;
        private LinkedBlockingQueue<byte[]> queue;

        public WorkerThread(String name) {
            this.name = name;
        }

        public void addToQueue(byte[] line) {
            queue.add(line);
        }

        @Override
        public void run() {
            while(!isStopped()) {
                byte[] message= queue.poll();
                if(line != null) {
                    String id = getId(message);
                    MyObject myObject = getMyObject(id);
                    doLotOfStuff(myObject);
                }
            }
        }
    }
}

Conceptually this is kind of routing problem.从概念上讲,这是一种路由问题。 What you need to is:你需要的是:

Get your your main thread (single thread) reading messages of the queue and push the data to a FIFO queue per id.让您的主线程(单线程)读取队列的消息并将数据推送到每个 id 的 FIFO 队列。 Get a single thread to consume messages from each queue.获取一个线程来消费每个队列中的消息。

Locking examples will (probably) not work as after the second message order is not guaranteed even if fair=true .锁定示例将(可能)不起作用,因为即使fair=true也无法保证第二个消息顺序。

From Javadoc: Even when this lock has been set to use a fair ordering policy, a call to tryLock() will immediately acquire the lock if it is available, whether or not other threads are currently waiting for the lock.来自 Javadoc: Even when this lock has been set to use a fair ordering policy, a call to tryLock() will immediately acquire the lock if it is available, whether or not other threads are currently waiting for the lock.

One thing for you to decide is if you want to create aa thread per queue (which will exit once the queue is empty) or keep the fixed size thread pool and manage get the extra bits to assign threads to queues.您需要决定的一件事是,您是要为每个队列创建一个线程(一旦队列为空,它将退出)还是保持固定大小的线程池并管理获取额外位以将线程分配给队列。

So, you get a single thread reading from the original queue and writing to the per-id-queues and the you also get one thread per id reading from individual queues.因此,您从原始队列中读取单个线程并写入每个 id 队列,并且您还从单个队列中每个 id 读取一个线程。 This will ensure task serialization.这将确保任务序列化。

In terms of performance, you should see significant speed-up as long as the incoming messages have a nice distribution (id-wise).在性能方面,只要传入的消息具有良好的分布(id-wise),您就会看到显着的加速。 If you get mostly same-id messages then task will be serialized and also include the overhead for control object creation and synchronization.如果您获得大部分相同 ID 的消息,那么任务将被序列化,并且还包括控制对象创建和同步的开销。

You could use a separate Map for your locks.您可以为您的锁使用单独的Map There's also a WeakHashMap that will automatically discard entries when the key is no longer present.还有一个WeakHashMap会在键不再存在时自动丢弃条目。

static final Map<String, Lock> locks = Collections.synchronizedMap(new WeakHashMap<>());

public void lock(String id) throws InterruptedException {
    // Grab a Lock out of the map.
    Lock l = locks.computeIfAbsent(id, k -> new ReentrantLock());
    // Lock it.
    l.lockInterruptibly();
}

public void unlock(String id) throws InterruptedException {
    // Is it locked?
    Lock l = locks.get(id);
    if ( l != null ) {
        l.unlock();
    }
}

I think you have the right idea with your synchronized blocks, except you mis-analyze a bit and go too far in any case.我认为您对synchronized块的想法是正确的,除非您分析错误并且在任何情况下都走得太远。 The outer synchronized block shouldn't force you into dealing with only one message at a time, it just keeps multiple threads from accessing the same message at once.外部synchronized块不应该强迫您一次只处理一条消息,它只是防止多个线程同时访问同一条消息 But you don't need it.但你不需要它。 You really only need that inner synchronized block, on the MyObject instance.您实际上只需要MyObject实例上的内部synchronized块。 That will ensure that only one thread at a time can access any given MyObject instance, while enabling other threads to access messages, the Map and other MyObject instances as much as they want.这将确保一次只有一个线程可以访问任何给定的MyObject实例,同时允许其他线程根据需要访问消息、 Map和其他MyObject实例。

JSONObject message = getMessageFromSender();
String id = message.getString("id");
MyObject myObject = getMyObject(id);
synchronized (myObject) {
    doLotOfStuff(myObject);
}

If you don't like that, and the updates to the MyObject instances all involve single-method invocations, then you could just synchronize all of those methods.如果您不喜欢那样,并且MyObject实例的更新都涉及单方法调用,那么您可以只synchronize所有这些方法。 You still retain concurrency in the Map , but you're protecting the MyObject itself from concurrent updates.您仍然在Map保留并发性,但您正在保护MyObject本身免受并发更新。

class MyObject {
  public synchronize void updateFoo() {
    // ...
  }

  public synchronize void updateBar() {
    // ...
  }
}

When any Thread accesses any updateX() method it will automatically lock out any other Thread from accessing that or any other synchronized method.当任何Thread访问任何updateX()方法时,它将自动锁定任何其他Thread访问该方法或任何其他synchronized方法。 That would be simplest, if your updates match that pattern.如果您的更新与该模式匹配,那将是最简单的。

If not, then you'll need to make all of your worker Threads cooperate by using some sort of locking protocol.如果没有,那么您需要使用某种锁定协议使所有工作Threads协作。 The ReentrantLock that OldCurmudgeon suggests is a good choice, but I would put it on MyObject itself. OldCurmudgeon 建议的ReentrantLock是一个不错的选择,但我会把它放在MyObject本身上。 To keep things ordered properly, you should use the fairness parameter (see http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html#ReentrantLock-boolean- ).为了保持正确排序,您应该使用公平参数(请参阅http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html#ReentrantLock-boolean- )。 "When set true, under contention, locks favor granting access to the longest-waiting thread." “当设置为 true 时,在争用情况下,锁有利于授予对等待时间最长的线程的访问权限。”

class MyObject {
  private final ReentrantLock lock = new ReentrantLock(true);

  public void lock() {
    lock.lock();
  }

  public void unlock() {
    lock.unlock();
  }

  public void updateFoo() {
    // ...
  }

  public void updateBar() {
    // ...
  }
}

Then you could update things like this:然后你可以更新这样的东西:

JSONObject message = getMessageFromSender();
String id = message.getString("id");
MyObject myObject = getMyObject(id);
myObject.lock();
try {
    doLotOfStuff(myObject);
}
finally {
    myObject.unlock();
}

The important takeaway is that you don't need to control access to the messages, nor the Map .重要的一点是您不需要控制对消息的访问,也不需要控制Map All you need to do is ensure that any given MyObject is being updated by at most one thread at a time.您需要做的就是确保任何给定的MyObject一次最多被一个线程更新。

You could get some speedup if you split up the JSON parsing from the doLotsOfStuff() .如果您将 JSON 解析与doLotsOfStuff()分开,您可以获得一些加速。 One thread listens for messages, parses them, then puts the parsed message on a Queue to maintain chronological order.一个线程侦听消息,解析它们,然后将解析的消息放在队列中以保持时间顺序。 A second thread reads from that Queue and doesLotsOfStuff with no need for locking.第二个线程从该队列中读取并执行LotsOfStuff,无需锁定。

However, since you apparently need more than a 2X speedup this is probably insufficient.但是,由于您显然需要 2 倍以上的加速,这可能还不够。

Added添加

Another possibility is multiple HashMaps.另一种可能性是多个 HashMap。 For example, if all the IDs are ints, make 10 HashMaps for IDs ending with 0,1,2... Incoming messages get directed to one of 10 threads, which parse the JSON and update their relevant Map.例如,如果所有 ID 都是整数,则为以 0,1,2 结尾的 ID 创建 10 个 HashMap...传入消息将被定向到 10 个线程之一,这些线程解析 JSON 并更新其相关 Map。 Order is maintained within each Map, and there are no locking or contention issues.在每个 Map 中维护顺序,并且不存在锁定或争用问题。 Assuming the message IDs are randomly distributed this yields up to a 10x speedup, though there is one extra layer of overhead to get at your Map.假设消息 ID 是随机分布的,这会产生高达 10 倍的加速,尽管还有一层额外的开销来获取您的 Map。 eg例如

Thread JSON                     Threads 0-9
--------------------------------------------------------------
while (notInterrupted) {
   read / parse next JSON message
   mapToUse = ID % 10
   pass JSON to that Thread's queue
}
                                while (notInterrupted) {
                                   take JSON off queue
                                   // I'm the only one with writing to Map#N
                                   do computation and update ID
                                }

Actually here is a design idea: when a consumer takes a request to work on your Object it should actually remove the object with that ID from your list of Objects and then re-insert it back once the processing is done.实际上这是一个设计理念:当消费者接受处理您的对象的请求时,它实际上应该从您的对象列表中删除具有该 ID 的对象,然后在处理完成后将其重新插入。 Then any other consumer getting request to work on the object with the same id should be in blocking mode waiting for the object with that ID to re-appear in your list.然后,任何其他消费者请求处理具有相同 ID 的对象都应该处于阻塞模式,等待具有该 ID 的对象重新出现在您的列表中。 You will need to add a management to keep record of all existing objects so when you can distinguish between the object that exists already but is not currently in the list (ie being processed by some other consumer) and the object that does not exist yet.您将需要添加一个管理来记录所有现有对象,以便您可以区分已经存在但当前不在列表中的对象(即正在被其他消费者处理)和尚不存在的对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM