简体繁体 English

了解 Akka 集群分片

[英]Understanding Akka cluster sharding

原文 2020-07-12 13:45:20 1 3 java/ scala/ akka/ sharding/ akka-cluster

I'm learning Akka sharding module.我正在学习 Akka 分片模块。 There is something I don't understand abour Sharding.关于分片，我有一些不明白的地方。 Let's imagine you want to shard an actor: you have many entites from the same actor distribued on many nodes.假设你想对一个actor进行分片：你有许多来自同一个actor的实体分布在许多节点上。 Each entity can have its own state, which may differ from another entity.每个实体都可以有自己的 state，这可能与另一个实体不同。

A client is making a request (sending a message) to your shard actor to get back its status value.客户端正在向您的分片 Actor 发出请求（发送消息）以获取其状态值。 This is message is going to be processed by an entity and giving back its value as a result.这是消息将由实体处理并返回其值作为结果。 But if it were treated by another entity the result would be different.但如果它被另一个实体处理，结果会有所不同。 But it should be the same because all entites derive from the same actor, shouldn'it?但它应该是相同的，因为所有实体都来自同一个参与者，不是吗？

3 个解决方案

It seems you misunderstand the concept of Akka cluster sharding, let me explain with an example.看来你误解了Akka集群分片的概念，我举个例子解释一下。

Let's say your service is responsible for responding with user profiles to requests.假设您的服务负责使用用户配置文件响应请求。 And to gain extremely low latency, you decide to use Akka actors to cache the user profiles in memory rather than having to query DB per request.为了获得极低的延迟，您决定使用 Akka 演员来缓存 memory 中的用户配置文件，而不必根据请求查询数据库。

If your website only has 10 users and each user profile is just a few KB, you can hold all 10 user profiles in a single actor without issue, and you won't need cluster sharding for sure.如果您的网站只有 10 个用户并且每个用户配置文件只有几 KB，那么您可以毫无问题地将所有 10 个用户配置文件保存在一个参与者中，并且您肯定不需要集群分片。 However, if you have 10 million users, probably the 10 million user profiles won't fit into a single actor's memory, also it'd expensive if the actor goes down, as it means you need a large DB query to gain these data back from persistence.但是，如果您有 1000 万用户，那么这 1000 万用户配置文件可能不适合单个参与者的 memory，如果参与者出现故障也会很昂贵，因为这意味着您需要一个大型数据库查询来获取这些数据来自坚持。

In this scenario, cluster sharding is a fit.在这种情况下，集群分片是合适的。 You will have 10 million Akka actors, distributed across your cluster, and each actor stores only 1 user profile.您将拥有 1000 万个 Akka 参与者，分布在您的集群中，每个参与者仅存储 1 个用户配置文件。 So GetUserProfile(userProfileId = 123) won't give you different response - it will always be routed to THE actor that holds user profile for the user 123, thus the response will always be the same.所以GetUserProfile(userProfileId = 123)不会给你不同的响应 - 它总是会被路由到持有用户 123 的用户配置文件的演员，因此响应总是相同的。

How does the routing work?路由如何工作？ Check extractShardId and extractEntityId in the doc检查文档中的extractShardId和extractEntityId

But [the message response] should be the same because all entites derive from the same actor, shouldn'it?但是 [消息响应] 应该是相同的，因为所有实体都来自同一个参与者，不是吗？

No, every actor has their own state and represents something different.不，每个演员都有自己的 state 并代表不同的东西。 If you had a Customer class, you wouldn't expect each Customer object to have the same data.如果您有一个客户 class，您不会期望每个客户 object 具有相同的数据。 Every customer object would have it's own name, id, etc.每个客户 object 都会有自己的姓名、ID 等。

The same is true for Actors.演员也是如此。 Actors have their own state and represent some kind of domain entity.演员有自己的 state 并代表某种领域实体。 If you send a GetCustomerName message to an actor, you would expect each Actor to give you a different name.如果您向一个演员发送GetCustomerName消息，您会希望每个演员给您一个不同的名字。

This is especially true for Cluster Sharding.对于集群分片尤其如此。 The point of Cluster Sharding is so that you can scale past a single node: either for scalability, elasticity, or resilience.集群分片的意义在于，您可以超越单个节点：无论是可扩展性、弹性还是弹性。 But they are still Actors each with its own state.但它们仍然是每个都有自己的 state 的 Actor。 Sending a GetCustomerName will (and should) give you a different response from every different actor.发送GetCustomerName将（并且应该）为您提供来自每个不同参与者的不同响应。 Sharding just gives you the ability to distribute those actors across multiple machines and have the location of the actor be transparent to the sender.分片只是让您能够将这些参与者分布在多台机器上，并使参与者的位置对发送者透明。

In Akka cluster sharding each actor should have a unique name(usually entity id) and represent a unique entity.在 Akka 集群分片中，每个参与者都应该有一个唯一的名称（通常是实体 ID）并代表一个唯一的实体。 When an actor starting/restarting entity loaded (usually from database) into actor state.当一个actor启动/重启实体（通常从数据库）加载到actor state中时。

If an actor receives messages to update the entity then the actor should update database and actor state, if an actor receives messages to read entity then the actor should read entity from actor state only (it is guaranteed to be the same as in database as all update operations handled by only one actor).如果参与者接收到更新实体的消息，那么参与者应该更新数据库和参与者 state，如果参与者接收到读取实体的消息，那么参与者应该仅从参与者 state 读取实体（保证与数据库中的所有更新操作仅由一个参与者处理）。

If any node failed or in case of cluster scaling then actor corresponding for the requested entity can be recreated on another node, shard region.如果任何节点发生故障或在集群扩展的情况下，则可以在另一个节点（分片区域）上重新创建与请求的实体相对应的参与者。