简体繁体 English

进程内缓存与分布式缓存与可变/不可变对象的一致性

[英]In-process cache vs distributed cache on consistency with mutable/immutable objects

原文 2015-10-25 16:50:04 5 2 java/ caching/ architecture/ memcached/ distributed-caching

I heard my colleague saying an in-process cache would be a better option when caching immutable objects, where consistency is not a big issue (eventually consistency). 我听到我的同事说在缓存不可变对象时，进程内缓存是更好的选择，其中一致性不是一个大问题（最终是一致性）。 Whereas an external distributed cache is more suitable for mutable objects that you always want your reads to be consistent (strong). 而外部分布式缓存更适合于可变对象，而您始终希望读取一致（强）。

Is this always the truth? 这总是真的吗？ I don't really see how mutability is related to consistency. 我真的没有看到可变性与一致性有什么关系。 Can someone help me understand this? 有人能帮助我理解这个吗？

2 个解决方案

When you use a distributed cache, each object is replicated among multiple independent machines, multiple cache nodes . 使用分布式缓存时，每个对象都在多个独立的计算机，多个缓存节点之间进行复制。

If your objects are immutable, replication is not an issue: since the objects never change, any cache instance will deliver exactly the same objects. 如果您的对象是不可变的，则复制不是问题：由于对象永远不会更改，因此任何缓存实例都将提供完全相同的对象。

As soon as the objects become mutable, the consistency issue arise: when you ask a cache instance for an object, how can you be sure that the object which is delivered to you is up-to-date? 一旦对象变得可变，就会出现一致性问题：当您向缓存实例请求对象时，如何确定传递给您的对象是最新的？ What if, while one cache instance was serving you, the object was being modified by another user on another cache instance? 如果在为一个缓存实例提供服务时，另一个用户在另一个缓存实例上修改了该对象，该怎么办？ In that case, you would not receive the latest version, you would receive a stale version . 在这种情况下，您将不会收到最新版本，您将收到一个陈旧版本 。

To deal with this issue, a choice has to be made. 要解决这个问题，必须做出选择。 One option is to accept some degree of staleness, which allows better performance. 一种选择是接受某种程度的陈旧性，这样可以获得更好的性能。 Another option is to use some synchronization protocol, so that you never receive stale data: but there obviously is a performance penalty to be paid for this data synchronization between distant cache nodes. 另一种选择是使用一些同步协议，这样你就不会收到过时的数据：但是远程缓存节点之间的数据同步显然会有性能损失。

Conversely, imagine that you upload to a cache node some modifications of an object. 相反，假设您上传到缓存节点对对象进行了一些修改。 What if, at the same time, another user uploads some modifications of the same object to another cache node? 如果同时另一个用户将同一对象的某些修改上传到另一个缓存节点会怎么样？ Should this be allowed, or should it be forbidden by some locking mechanism? 这是允许的，还是应该被某些锁定机制禁止？

In addition, should object modifications on your cache node become immediately visible to the users of this cache node? 此外，缓存节点上的对象修改是否应立即对此缓存节点的用户可见？ Or should they become visible only after they have been replicated to the other nodes? 或者它们是否应该在被复制到其他节点后才可见？

At the end of the day, mutable objects do make things more complicated when sharing a distributed cache among multiple users. 在一天结束时，可变对象在多个用户之间共享分布式缓存时会使事情变得更加复杂。 Still, it doesn't mean that these cache should not be used: it just means that it takes more time and more caution to study all available options and choose the appropriate cache for each application. 但是，这并不意味着不应该使用这些缓存：它只是意味着研究所有可用选项并为每个应用程序选择适当的缓存需要更多的时间和更多的谨慎。

Although, Daniel has given a good explanation, but for some reason, it wasn't 100% clear to me. 尽管如此，但丹尼尔给出了一个很好的解释，但出于某种原因，我并不是100％清楚。 So, I googled out, and this article cleared the mist for me. 所以，我google了，这篇文章为我清除了雾。

Excerpts from the article: 摘自文章：

While using an in-process cache , your cache elements are local to a single instance of your application. 使用进程内缓存时 ，缓存元素对于应用程序的单个实例是本地的。 Many medium-to-large applications, however, will not have a single application instance as they will most likely be load-balanced. 但是，许多中型到大型应用程序不会有单个应用程序实例，因为它们很可能是负载平衡的。 In such a setting, you will end up with as many caches as your application instances, each having a different state resulting in inconsistency. 在这样的设置中，您将获得与应用程序实例一样多的缓存，每个缓存都具有不同的状态，从而导致不一致。

Distributed caches, although deployed on a cluster of multiple nodes, offer a single logical view (and state) of the cache. 分布式缓存虽然部署在多个节点的集群上，但提供了缓存的单个逻辑视图（和状态）。 In most cases, an object stored in a distributed cache cluster will reside on a single node in a distributed cache cluster. 在大多数情况下，存储在分布式缓存集群中的对象将驻留在分布式缓存集群中的单个节点上。 By means of a hashing algorithm, the cache engine can always determine on which node a particular key-value resides. 通过散列算法，缓存引擎可以始终确定特定键值驻留在哪个节点上。 Since there is always a single state of the cache cluster, it is never inconsistent. 由于缓存集群始终存在单个状态，因此永远不会出现矛盾。

If you are caching immutable objects , consistency ceases to be an issue. 如果要缓存不可变对象 ，则一致性不再是问题。 In such a case, an in-process cache is a better choice as many overheads typically associated with external distributed caches are simply not there. 在这种情况下，进程内缓存是更好的选择，因为通常与外部分布式缓存相关联的许多开销根本不存在。 If your application is deployed on multiple nodes, you cache mutable objects and you want your reads to always be consistent rather than eventually consistent, a distributed cache is the way to go. 如果您的应用程序部署在多个节点上，那么您可以缓存可变对象，并且希望读取始终保持一致而不是最终一致，分布式缓存是可行的方法。