简体   繁体   English

番石榴缓存中的 get() 线程安全操作吗?

[英]Is get() thread-safe operation in Guava's cache?

I found out that put and get with CacheLoader operations use Reentrant lock under the hood, but why this is not implemented for getIfPresent operation?我发现使用 CacheLoader 操作的 put 和 get 在后台使用可重入锁,但是为什么 getIfPresent 操作没有实现呢?

get which is used by getIfPresent get 由 getIfPresent 使用

@Nullable
        V get(Object key, int hash) {
            try {
                if (this.count != 0) {
                    long now = this.map.ticker.read();
                    ReferenceEntry<K, V> e = this.getLiveEntry(key, hash, now);
                    Object value;
                    if (e == null) {
                        value = null;
                        return value;
                    }

                    value = e.getValueReference().get();
                    if (value != null) {
                        this.recordRead(e, now);
                        Object var7 = this.scheduleRefresh(e, e.getKey(), hash, value, now, this.map.defaultLoader);
                        return var7;
                    }

                    this.tryDrainReferenceQueues();
                }

                Object var11 = null;
                return var11;
            } finally {
                this.postReadCleanup();
            }
        }

put

 @Nullable
        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            this.lock();
           .....

Is the only thing I can do to reach thread-safety in basic get/put operations is to use synchronization on client?为了在基本的 get/put 操作中实现线程安全,我唯一能做的就是在客户端上使用同步?

Seems like guava cache is implementing ConcurrentMap api似乎番石榴缓存正在实现 ConcurrentMap api

class LocalCache<K, V> extends AbstractMap<K, V> implements ConcurrentMap<K, V> 

so the base get and put operations should be thread safe by nature所以基本的 get 和 put 操作本质上应该是线程安全的

Even if getIfPresent did use locks, that won't help.即使getIfPresent确实使用了锁,那也无济于事。 It's more fundamental than that.它比这更根本。

Let me put that differently: Define 'threadsafe'.让我换一种说法:定义“线程安全”。

Here's an example of what can happen in a non-threadsafe implementation:以下是非线程安全实现中可能发生的情况的示例:

  • You invoke .put on a plain jane juHashMap , not holding any locks.您调用.put在普通的 jane juHashMap上,不持有任何锁。
  • Simultaneously, a different thread also does that.同时,另一个线程也这样做。
  • The map is now in a broken state. map 现在处于损坏的 state 中。 If you iterate through the elements, the first put statement doesn't show at all, the second put statement shows up in your iteration, and a completely unrelated key has disappeared.如果您遍历元素,则第一个 put 语句根本不显示,第二个 put 语句出现在您的迭代中,并且完全不相关的键消失了。 But calling .get(k) on that map with the second thread's key doesn't find it eventhough it is returned in the .entrySet() .但是使用第二个线程的键在 map 上调用.get(k)并没有找到它,即使它在.entrySet()中返回。 This makes no sense and breaks all rules of juHashMap.这是没有意义的,并且违反了 juHashMap 的所有规则。 The spec of hashmap does not explain any of this, other than 'I am not threadsafe' and leaves it at that. hashmap 的规范没有解释任何这些,除了“我不是线程安全的”并留在那里。

That's an example of NOT thread safe.这是一个非线程安全的例子。

Here is an example of perfectly fine:这是一个完美的例子:

  • 2 threads begin. 2个线程开始。
  • Some external event (eg a log) shows that thread 1 is very very very slightly ahead of thread 2, but the notion of 'ahead', if it is relevant, means your code is broken.一些外部事件(例如日志)显示线程 1 非常非常非常稍微领先于线程 2,但是如果相关的话,“领先”的概念意味着您的代码被破坏了。 That's just not how multicore works.这不是多核的工作方式。
  • Thread 1 adds a thing to a concurrency-capable map, and logs that it has done so.线程 1 向具有并发能力的 map 添加了一个东西,并记录了它已经这样做了。
  • Thread 2 logs that it starts an operation.线程 2 记录它开始一个操作。 (From the few things you have observed, it seems to be running slightly 'later') so I guess we're "after" the point where T1 added the thing) now queries for the thing and does not get a result . (从您观察到的几件事来看,它似乎稍微“稍后”运行)所以我想我们“在” T1 添加该事物的点之后)现在查询该事物并且没有得到结果 1 1

That's fine.没关系。 That's still thread safe.那仍然是线程安全的。 Thread safe doesn't mean every interaction with an instance of that data type can be understood in terms of 'first this thing happened, then that thing happened'.线程安全并不意味着与该数据类型的实例的每次交互都可以理解为“首先发生这件事,然后发生那件事”。 Wanting that is very problematic , because the only way the computer can really give you that kind of guarantee is to disable all but a single core and run everything very very slowly.想要这样做是非常有问题的,因为计算机真正能给你这种保证的唯一方法是禁用除单个内核之外的所有内核,并且非常缓慢地运行所有内容。 The point of a cache is to speed things up, not slow things down!缓存的目的是加快速度,而不是减慢速度!

The problem with the lack of guarantees here is that if you run multiple separate operations on the same object, you run into trouble.这里缺乏保证的问题是,如果你在同一个 object 上运行多个单独的操作,你就会遇到麻烦。 Here's some pseudocode for a bank ATM machine that will go epically wrong in the long run:这是银行 ATM 机的一些伪代码,从长远来看,它会 go 严重错误:

  • Ask user how much money they want (say, €50,-).询问用户他们想要多少钱(例如,50 欧元,-)。
  • Retrieve account balance from a 'threadsafe' Map<Account, Integer> (maps account ID to cents in account).从 'threadsafe' Map<Account, Integer>中检索帐户余额(将帐户 ID 映射到帐户中的美分)。
  • Check if €50,-.检查是否 50 欧元,-。 If no, show error.如果否,则显示错误。 If yes...如果是...
  • Spit out €50,-, and update the threadsafe map with .put(acct, balance - 5000) .吐出 50 欧元,并用.put(acct, balance - 5000)更新线程安全 map。

Everything perfectly threadsafe.一切都是完全线程安全的。 And yet this is going to go very very wrong - if the user uses their card at the same time they are in the bank withdrawing money via the teller, either the bank or the user is going to get very lucky here.然而这对 go 来说是非常非常错误的 - 如果用户同时使用他们的卡,他们在银行通过柜员取款,那么银行或用户将在这里变得非常幸运。 I'd hope it's obvious to see how and why.我希望很明显可以看出如何以及为什么。

The upshot is: If you have dependencies between operations there is nothing you can do with 'threadsafe' concepts that can possibly fix it;结果是:如果您在操作之间存在依赖关系,那么您无法使用“线程安全”概念来解决它; the only way is to actually write code that explicitly marks off these dependencies .唯一的方法是实际编写明确标记这些依赖关系的代码

The only way to write that bank code is to either use some form of locking.编写该银行代码的唯一方法是使用某种形式的锁定。 Basic locking, or optimistic locking, either way is fine, but locking of some sort.基本锁定或乐观锁定,无论哪种方式都很好,但某种锁定。 It has to look like 2 :必须看起来像2

start some sort of transaction;
fetch account balance;
deal with insufficient funds;
spit out cash;
update account balance;
end transaction;

Now guava's code makes perfect sense:现在番石榴的代码非常有意义:

  • There is no such thing as 'earlier' and 'later'.没有“早”和“晚”之类的东西。 You need to stop thinking about multicore in that way.您需要停止以这种方式考虑多核。 Unless you explicitly write primitives that establish these things.除非您明确编写建立这些东西的原语。 The cache interface does have these.缓存接口确实有这些。 Use the right operation!使用正确的操作! getIfPresent will get you the cache if it is possible for your current thread to get at that data.如果您的当前线程有可能获取该数据, getIfPresent将为您获取缓存。 If it is not, it returns null , that's what that call does .如果不是,则返回null ,这就是该调用的作用

  • If instead you want this common operation: "Get me the cached value. However, if it is not available, then run this code to calculate the cached value, cache the result, and return it to me. In addition, ensure that if 2 threads simultaneously end up running this exact operation, only one thread runs the calculation, and the other will wait for the other one (don't say 'first' one, that's not how you should think about threads) to finish, and use that result instead".. then, use the right call for that: .cache.get(key, k -> calculateValueForKey(k)) .如果你想要这个常见的操作:“获取我缓存的值。但是,如果它不可用,那么运行这个代码来计算缓存值,缓存结果并返回给我。另外,确保如果 2线程同时最终运行这个精确的操作,只有一个线程运行计算,另一个将等待另一个(不要说“第一个”,这不是你应该如何看待线程)完成,并使用它结果而不是“..然后,使用正确的调用: .cache.get(key, k -> calculateValueForKey(k)) As the docs explicitly call out this will wait for another thread that is also 'loading' the value (that's what guava cache calls the calculation process).正如文档明确指出的那样,这将等待另一个也在“加载”该值的线程(这就是番石榴缓存调用计算过程的内容)。

  • No matter what you invoke from the Cache API, you can't 'break it', in the sense that I broke that HashMap.无论您从缓存 API 调用什么,都不能“破坏它”,因为我破坏了 HashMap。 The cache API does this partly by using locks (such as ReentrantLock for mutating operations on it), and partly by using a ConcurrentHashMap under the hood.缓存 API 部分通过使用锁(例如用于对其进行变异操作的ReentrantLock )以及部分通过在后台使用ConcurrentHashMap来实现这一点。

[1] Often log frameworks end up injecting an actual explicit lock in the proceedings and thus you do often get guarantees in this case, but only 'by accident' because of the log framework. [1] 通常日志框架最终会在进程中注入一个实际的显式锁定,因此在这种情况下您经常会得到保证,但只是因为日志框架而“意外”。 This isn't a guarantee (maybe you're logging to separate log files, for example.) and often what you 'witness' may be a lie, For example, maybe you have 2 log statements that both log to separate files (and don't lock each other out at all).这不是保证(例如,您可能正在记录到单独的日志文件。)并且通常您“见证”的可能是谎言,例如,也许您有 2 个日志语句,它们都记录到单独的文件(和根本不要把对方锁在外面)。 and they log the timestamp as part of the log: The fact that one log line says '12:00:05' and the other says '12:00:06' means nothing - the log thread fetches the current time, creates a string describing the message, and tells the OS to write it to the file.他们将时间戳记为日志的一部分:一个日志行说“12:00:05”而另一行说“12:00:06”这一事实没有任何意义——日志线程获取当前时间,创建一个字符串描述消息,并告诉操作系统将其写入文件。 You obviously get absolutely no guarantee that the 2 log threads run at identical speed.您显然无法保证 2 个日志线程以相同的速度运行。 Maybe one thread fetches the time (12:00:05), creates the string, wants to write to the disk but the OS switches to the other thread before the write goes through, the other thread is the other logger, it reads time (12:00:06), makes the string, writes it out, finishes up, and then the first logger continues, writes its context.也许一个线程获取时间(12:00:05),创建字符串,想要写入磁盘但操作系统在写入完成之前切换到另一个线程,另一个线程是另一个记录器,它读取时间( 12:00:06),创建字符串,将其写出,完成,然后第一个记录器继续,写入其上下文。 Tada: 2 threads where you 'observe' one thread is 'earlier' but that is incorrect.多田: 2 个线程,您“观察”一个线程“较早”,但这是不正确的。 Perhaps this example will further highlight why thinking about threads in terms of which one is 'first' steers you wrong.也许这个例子会进一步强调为什么以哪个是“第一个”来考虑线程会导致你错了。

[2] This code has the additional complication that you're interacting with systems that cannot be transactional. [2] 此代码具有额外的复杂性,即您正在与不能事务性的系统进行交互。 The point of a transaction is that you can abort it;交易的重点是您可以中止它; you cannot abort the user grabbing a bill from the ATM.您不能中止用户从 ATM 取款。 You solve that by logging that you're about to spit out the money, then spit out the money, then log that you have spit out the money.你通过记录你即将吐出钱来解决这个问题,然后吐出钱,然后记录你已经吐出钱。 And finally write to this log that it has been processed in the user's account balance.最后写入这个日志,说明它已经在用户的账户余额中处理了。 Other code needs to check this log and act accordingly.其他代码需要检查此日志并采取相应措施。 For example, on startup the bank's DB machine needs to flag 'dangling' ATM transactions and will have to get a human to check the video feed.例如,在启动时,银行的 DB 机器需要标记“悬空”的 ATM 交易,并且必须让人工检查视频源。 This solves the problem where someone trips over the power cable of the bank DB machine juuust as the user is about to grab the banknote from the machine.这解决了当用户即将从机器上抓取钞票时有人绊倒银行DB机器的电源线的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM