简体   繁体   English

如何在Java中优化并发操作?

[英]How to optimize concurrent operations in Java?

I'm still quite shaky on multi-threading in Java. 我对Java中的多线程仍然很不满意。 What I describe here is at the very heart of my application and I need to get this right. 我在这里描述的是我应用程序的核心,我需要正确地做到这一点。 The solution needs to work fast and it needs to be practically safe. 该解决方案需要快速工作,它需要实际的安全。 Will this work? 这样行吗? Any suggestions/criticism/alternative solutions welcome. 欢迎任何建议/批评/替代解决方案。


Objects used within my application are somewhat expensive to generate but change rarely, so I am caching them in *.temp files. 在我的应用程序中使用的对象生成起来有些昂贵,但是很少更改,因此我将它们缓存在* .temp文件中。 It is possible for one thread to try and retrieve a given object from cache, while another is trying to update it there. 一个线程有可能尝试从缓存中检索给定的对象,而另一个线程正在尝试在那里进行更新。 Cache operations of retrieve and store are encapsulated within a CacheService implementation. 检索和存储的缓存操作封装在CacheService实现中。

Consider this scenario: 考虑这种情况:

Thread 1: retrieve cache for objectId "page_1".
Thread 2: update cache for objectId "page_1".
Thread 3: retrieve cache for objectId "page_2".
Thread 4: retrieve cache for objectId "page_3".
Thread 5: retrieve cache for objectId "page_4".

Note: thread 1 appears to retrieve an obsolete object, because thread 2 has a newer copy of it. 注意:线程1似乎是在检索过时的对象,因为线程2具有它的较新副本。 This is perfectly OK so I do not need any logic that will give thread 2 priority. 这完全可以,因此我不需要任何将给予线程2优先级的逻辑。

If I synchronize retrieve/store methods on my service, then I'm unnecessarily slowing things down for threads 3, 4 and 5. Multiple retrieve operations will be effective at any given time but the update operation will be called rarely. 如果我在服务上同步检索/存储方法,则不必要地降低线程3、4和5的速度。在任何给定时间,多次检索操作都会有效,但很少调用update操作。 This is why I want to avoid method synchronization. 这就是为什么我要避免方法同步。

I gather I need to synchronize on an object that is exclusively common to thread 1 and 2, which implies a lock object registry. 我收集到我需要在线程1和2专有的对象上进行同步,这意味着锁定对象注册表。 Here, an obvious choice would be a Hashtable but again, operations on Hashtable are synchronized, so I'm trying a HashMap. 在这里,一个明显的选择是Hashtable,但同样,对Hashtable的操作是同步的,因此我正在尝试HashMap。 The map stores a string object to be used as a lock object for synchronization and the key/value would be the id of the object being cached. 该映射存储一个字符串对象,用作同步的锁定对象,并且键/值将是要缓存的对象的ID。 So for object "page_1" the key would be "page_1" and the lock object would be a string with a value of "page_1". 因此,对于对象“ page_1”,键将为“ page_1”,而锁对象将是值为“ page_1”的字符串。

If I've got the registry right, then additionally I want to protect it from being flooded with too many entries. 如果我的注册表正确,那么我还要防止注册表被过多条目淹没。 Let's not get into details why. 让我们不详细说明为什么。 Let's just assume, that if the registry has grown past defined limit, it needs to be reinitialized with 0 elements. 让我们假设,如果注册表超过了定义的限制,则需要使用0个元素重新初始化它。 This is a bit of a risk with an unsynchronized HashMap but this flooding would be something that is outside of normal application operation. 对于未同步的HashMap,这会有一定的风险,但是这种泛洪将超出正常的应用程序操作范围。 It should be a very rare occurrence and hopefully never takes place. 这应该是非常罕见的情况,希望永远不会发生。 But since it is possible, I want to protect myself from it. 但是既然有可能,我想保护自己免受它的伤害。

@Service
public class CacheServiceImpl implements CacheService {
    private static ConcurrentHashMap<String, String> objectLockRegistry=new ConcurrentHashMap<>();

public Object getObject(String objectId) {
    String objectLock=getObjectLock(objectId);
    if(objectLock!=null) {
        synchronized(objectLock) {
            // read object from objectInputStream
    }
}

public boolean storeObject(String objectId, Object object) {
    String objectLock=getObjectLock(objectId);

    synchronized(objectLock) {
        // write object to objectOutputStream
    }
}

private String getObjectLock(String objectId) {
    int objectLockRegistryMaxSize=100_000;

    // reinitialize registry if necessary
    if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
        // hoping to never reach this point but it is not impossible to get here
        synchronized(objectLockRegistry) {
            if(objectLockRegistry.size()>objectLockRegistryMaxSize) {
                objectLockRegistry.clear();
            }
        }
    }

    // add lock to registry if necessary
    objectLockRegistry.putIfAbsent(objectId, new String(objectId));

    String objectLock=objectLockRegistry.get(objectId);
    return objectLock;
}

If you are reading from disk, lock contention is not going to be your performance issue. 如果您正在从磁盘读取数据,则锁争用将不是您的性能问题。

You can have both threads grab the lock for the entire cache, do a read, if the value is missing, release the lock, read from disk, acquire the lock, and then if the value is still missing write it, otherwise return the value that is now there. 您可以让两个线程都获取整个缓存的锁,进行读取,如果缺少该值,则释放该锁,从磁盘读取,获取该锁,然后如果仍然缺少该值,则将其写入,否则返回该值现在在那里。

The only issue you will have with that is the concurrent read trashing the disk... but the OS caches will be hot, so the disk shouldn't be overly trashed. 唯一的问题是并发读取会损坏磁盘...但是OS缓存会很热,因此不应过度浪费磁盘。

If that is an issue then switch your cache to holding a Future<V> in place of a <V> . 如果这是一个问题,则将您的缓存切换为保留Future<V>代替<V>

The get method will become something like: get方法将变为:

public V get(K key) {
    Future<V> future;
    synchronized(this) {
        future = backingCache.get(key);
        if (future == null) {
            future = executorService.submit(new LoadFromDisk(key));
            backingCache.put(key, future);
        }
    }
    return future.get();
}

Yes that is a global lock... but you're reading from disk, and don't optimize until you have a proved performance bottleneck... 是的,这是一个全局锁...但是您正在从磁盘读取数据,并且只有在证明性能瓶颈后才进行优化...

Oh. 哦。 First optimization, replace the map with a ConcurrentHashMap and use putIfAbsent and you'll have no lock at all! 首先优化,将地图替换为ConcurrentHashMap并使用putIfAbsent ,您将完全没有锁定! (BUT only do that when you know this is an issue) (但只有在您知道这是一个问题时才这样做)

The complexity of your scheme has already been discussed. 您的方案的复杂性已经讨论过了。 That leads to hard to find bugs. 这导致难以发现错误。 For example, not only do you lock on non-final variables, but you even change them in the middle of synchronized blocks that use them as a lock. 例如,不仅要锁定非最终变量,甚至还要在将它们用作锁定的同步块的中间进行更改。 Multi-threading is very hard to reason about, this kind of code makes it almost impossible: 多线程很难推理,这种代码使得几乎不可能:

    synchronized(objectLockRegistry) {
        if(objectLockRegistry.size() > objectLockRegistryMaxSize) {
            objectLockRegistry = new HashMap<>(); //brrrrrr...
        }
    }

In particular, 2 simultaneous calls to get a lock on a specific string might actually return 2 different instances of the same string, each stored in a different instance of your hashmap (unless they are interned), and you won't be locking on the same monitor. 特别是,如果同时进行2次调用以锁定特定字符串,则实际上可能会返回同一字符串的2个不同实例,每个实例存储在您的hashmap的另一个实例中(除非它们被内锁),并且您不会锁定在同一台显示器。

You should either use an existing library or keep it a lot simpler. 您应该使用现有的库,或者简化它。

If your question includes the keywords "optimize", "concurrent", and your solution includes a complicated locking scheme ... you're doing it wrong. 如果您的问题中包含关键字“ optimize”,“ concurrent”,并且您的解决方案中包含复杂的锁定方案,则您做错了。 It is possible to succeed at this sort of venture, but the odds are stacked against you. 在这种冒险活动中可能会成功,但对您不利。 Prepare to diagnose bizarre concurrency bugs, including but not limited to, deadlock, livelock, cache incoherency... I can spot multiple unsafe practices in your example code. 准备诊断奇怪的并发错误,包括但不限于死锁,活动锁,缓存不一致性...我可以在示例代码中发现多种不安全的做法。

Pretty much the only way to create a safe and effective concurrent algorithm without being a concurrency god is to take one of the pre-baked concurrent classes and adapt them to your need. 在没有并发神的情况下,创建安全有效的并发算法的唯一方法就是采用一种预烘焙的并发类,并使它们适应您的需求。 It's just too hard to do unless you have an exceptionally convincing reason. 除非您有令人信服的理由,否则这很难做。

You might take a look at ConcurrentMap . 您可以看看ConcurrentMap You might also like CacheBuilder . 您可能还喜欢CacheBuilder

Using Threads and synchronize directly is covered by the beginning of most tutorials about multithreading and concurrency. 有关多线程和并发的大多数教程的开头都介绍了使用线程和直接同步。 However, many real-world examples require more sophisticated locking and concurrency schemes, which are cumbersome and error prone if you implement them yourself. 但是,许多现实世界中的示例都需要更复杂的锁定和并发方案,如果您自己实现它们,那么它们将很麻烦且容易出错。 To prevent reinventing the wheel over an over again, the Java concurrency library was created. 为了避免重蹈覆辙,创建了Java并发库。 There, you can find many classes that will be of great help to you. 在这里,您可以找到许多对您有很大帮助的课程。 Try googling for tutorials about java concurrency and locks. 尝试搜索有关Java并发和锁的教程。

As an example for a lock which might help you, see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html . 有关可以帮助您的锁的示例,请参见http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html

Rather than roll your own cache I would take a look at Google's MapMaker . 除了查看您自己的缓存之外,我还可以看看Google的MapMaker Something like this will give you a lock cache that automatically expires unused entries as they are garbage collected: 这样的事情将为您提供一个锁缓存,当未使用的条目被垃圾回收时,它会自动过期:

ConcurrentMap<String,String> objectLockRegistry = new MapMaker()
    .softValues()
    .makeComputingMap(new Function<String,String> {
      public String apply(String s) {
        return new String(s);
      });

With this, the whole getObjectLock implementation is simply return objectLockRegistry.get(objectId) - the map takes care of all the "create if not already present" stuff for you in a safe way. 这样,整个getObjectLock实现就是简单地return objectLockRegistry.get(objectId) -映射以安全的方式为您处理所有“创建(如果尚未存在)”的工作。

I Would do it similar, to you: just create a map of Object (new Object()). 我将对您进行类似的操作:只需创建一个Object(新Object())的映射。
But in difference to you i would use TreeMap<String, Object> or HashMap You call that the lockMap. 但是与您不同的是,我将使用TreeMap<String, Object>或HashMap您将其称为lockMap。 One entry per file to lock. 每个文件一个锁。 The lockMap is public available to all participating threads. lockMap对所有参与线程公开。
Each read and write to a specific file, gets the lock from the map. 每次对特定文件的读写都会从映射中获取锁。 And uses syncrobize(lock) on that lock object. 并在该锁定对象上使用syncrobize(lock)。
If the lockMap is not fixed, and its content chan change, then reading and writing to the map must syncronized, too. 如果lockMap不固定,并且其内容可能更改,则对地图的读写也必须同步。 ( syncronized (this.lockMap) {....}) (已syncronized (this.lockMap) {....})
But your getObjectLock() is not safe, sync that all with your lock. 但是您的getObjectLock()是不安全的,请与您的锁同步。 (Double checked lockin is in Java not thread safe!) A recomended book: Doug Lea, Concurrent Programming in Java (Java中的双重检查锁定不是线程安全的!)一本推荐的书:Doug Lea,Java并发编程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM