简体繁体 English

缓存是否需要同步？

[英]Does a cache need to synchronized?

原文 2009-01-30 20:14:47 3 5 c#/ java/ multithreading/ caching

This seems like perhaps a naive question, but I got into a discussion with a co-worker where I argued that there is no real need for a cache to be thread-safe/synchronized as I would assume that it does not matter who is putting in a value, as the value for a given key should be "constant" (in that it is coming from the same source ultimately). 这似乎是一个天真的问题，但我与一位同事进行了讨论，我认为实际上并不需要缓存是线程安全的/同步的，因为我认为谁放谁无关紧要值，因为给定键的值应为“常量”（因为它最终来自同一来源）。 If the values can change readily, then the cache itself does not seem to be all the useful (in that if you care that the value is "currently correct" you should go to the original source). 如果这些值可以随时更改，则高速缓存本身似乎并不是全部有用（因为如果您关心该值“当前正确”，则应转到原始源）。

The main reason I see to make at least the GET synchronized is that if it is very expensive to miss in the cache and you don't want multiple threads each going out to get a value to put back in the cache. 我认为至少要使GET同步的主要原因是，如果错过高速缓存的代价非常高，并且您不希望每个线程都退出以获取值并返回到高速缓存中，则不希望有多个线程。 Even then, you'd need something that actually blocks all consumers during a read-fetch-put cycle. 即使那样，您仍需要在读取-读取-放入周期中实际上阻塞所有使用者的东西。

Anyhow, my working assumption is that a hash is by its very nature thread-safe because for any {key,value} combination, the value is either null or something that it doesn't matter who go there "first" to write. 无论如何，我的工作假设是，从本质上讲，哈希是线程安全的，因为对于任何{key，value}组合，该值要么为null，要么是谁“先”写入那里无关紧要。

Question is: Is this a reasonable assumption? 问题是：这是一个合理的假设吗？

Update: The real scope of my question is around very simple id->value style caches (or {parameters}->{calculated value} where no matter who writes to the cache, the value will be the same and we are just trying to save from "re-calculating"/going back to the database. The actual graph of the object isn't relevant and the cache is generally long-lived. 更新：我的问题的真正范围是围绕非常简单的id-> value样式缓存（或{parameters}-> {calculated value}，其中无论是谁写入缓存，该值都是相同的，我们正试图从“重新计算” /返回数据库中保存该对象的实际图形无关紧要，并且缓存通常是长期存在的。

5 个解决方案

For most implementations of a hash, you'd need to synchronize. 对于散列的大多数实现，您需要进行同步。 What if the hash table needs to be expanded/rehashed? 如果哈希表需要扩展/重新映射怎么办？ What if two threads are trying to add something to the hash table where the keys are different, but the hashes collide? 如果两个线程试图向键不同但哈希冲突的哈希表中添加内容，该怎么办？ They could both be modifying the same slot in the hash table in different ways at the same time. 他们都可能同时以不同的方式修改哈希表中的同一插槽。 Assuming you're using a hash table to implement your cache (which you imply in your question) I suggest reading a little about the details of how hash tables are implemented if you're not already familiar with this. 假设您正在使用哈希表实现缓存（这是您在问题中所暗示的），如果您还不熟悉哈希表的实现方法，则建议您阅读一些内容。

Writes aren't always atomic. 写入并不总是原子的。 You must either use atomic data types or provide some synchronization (RCU, locks etc.). 您必须使用原子数据类型或提供一些同步（RCU，锁等）。 No shared data is thread-safe per se. 没有共享数据本身就是线程安全的。 Or make this go away by sticking to lock-free algorithms (that is, where possible and feasible). 或者通过坚持使用无锁算法来消除这种情况（也就是说，在可行和可行的情况下）。

As long as the cost for acquiring and releasing a lock is less than the cost for recreating the object (from a file or database or whatever) all accesses to a cache should indeed be synchronized. 只要获取和释放锁的成本小于（从文件或数据库等）重新创建对象的成本，对缓存的所有访问的确应该同步。 If it's not you don't really need a cache at all. 如果不是，那么您根本就不需要缓存。 :) :)

If you want to avoid data corruption, you must synchronize. 如果要避免数据损坏，则必须进行同步。 This is especially true when the cache contains multiple tables that must be updated atomically. 当高速缓存包含必须原子更新的多个表时，尤其如此。 Imagine you have a database for a DMV (department of motor vehicles). 假设您有一个DMV（机动车部门）的数据库。 You add a new person to the database, that person will have records for auto registrations plus records for tickets received for records for home address and perhaps other contact information. 您将一个新人员添加到数据库中，该人员将具有自动注册记录以及收到的用于家庭地址记录以及其他联系信息的票证记录。 If you don't update these tables atomically -- in the database and in the cache -- then any client pulling data out of the cache may get inconsistent data. 如果不以原子方式（在数据库和缓存中）更新这些表，则任何从缓存中拉出数据的客户端都可能会获得不一致的数据。

Yes, any one piece of data may be constant, but databases very commonly hold data that -- if not updated together and atomically -- can cause database clients to get incorrect or incomplete or inconsistent results. 是的，任何一条数据都可能是恒定的，但是数据库通常保存的数据（如果不一起进行原子更新）会导致数据库客户端获得不正确或不完整或不一致的结果。

If you are using Java 5 or above you can use a ConcurrentHashMap. 如果您使用的是Java 5或更高版本，则可以使用ConcurrentHashMap。 This supports multiple readers and writers in a threadsafe manner. 这以线程安全的方式支持多个读取器和写入器。