Spring 数据 GemFire 自定义分区和性能

Question

We are using Spring Data GemFire server, client and locator.我们正在使用 Spring Data GemFire 服务器、客户端和定位器。 All of our GemFire PARTITION Regions have complex keys.我们所有的 GemFire PARTITION Regions 都有复杂的键。

For example:例如：

class Key { 
  String id1;
  String id2;
  Date date;
}

We would like to create a custom partition based on this entire key.我们想根据整个密钥创建一个自定义分区。 In the getObject() method we are planning to return a |在getObject()方法中，我们计划返回一个 | delimited string of these 3 fields.这 3 个字段的分隔字符串。

Is this is a best practice or is there any other way to return the object?这是最佳做法还是有其他方法可以返回 object？

We are also planning to create key indexes and in this case we will have to create indexes individually on Key.id1 and Key.id2 and Key.date as our searches will based on the key dates and key id1, id2.我们还计划创建关键索引，在这种情况下，我们必须分别在Key.id1和Key.id2和Key.date上创建索引，因为我们的搜索将基于关键日期和关键 id1、id2。

Is this a right way to create the key index for improving the performance?这是创建关键索引以提高性能的正确方法吗？

Based on GemFire documentation, we are planning to use Functions to improve the performance.根据 GemFire 文档，我们计划使用 Functions 来提高性能。 In the Filter argument for search to happen in specific partition在Filter参数中搜索发生在特定分区

Do we just need to send the complex object or whatever partition logic we have added in getObject passed in the filter set?我们是否只需要发送复杂的 object 或我们在getObject中添加的任何分区逻辑并传入过滤器集？

Answer 1

First of all, this problem is independent of whether you started your GemFire (data) servers using Spring Data GemFire (SDG) or not, such as by using Gfsh .首先，这个问题与您是否使用Spring Data GemFire (SDG) 启动 GemFire（数据）服务器无关，例如使用Gfsh 。 Having said that, there are significant advantages to using Spring , and specifically SDG, to bootstrap and configure your servers, Locators, and clients.话虽如此，使用Spring （特别是 SDG）来引导和配置您的服务器、定位器和客户端具有显着优势。 But, I simply wanted to make this distinction where this problem is concerned for other interested readers.但是，我只是想在这个问题与其他感兴趣的读者有关的地方进行区分。

By getObject() method, I assume your are actually referring to PartitionResolver.getRoutingObject() ?通过getObject()方法，我假设您实际上指的是PartitionResolver.getRoutingObject() ？ See Javadoc .请参阅Javadoc 。

In general, I'd say it is nearly always preferable to use simple, scalar types as keys in your Regions , such as Long , Integer , String , etc. Most searching should be based on the value, or properties of the value (ie Object) rather than individual components (eg id1 ) of the key.一般来说，我想说在您的Regions中使用简单的标量类型作为键几乎总是更可取的，例如Long 、 Integer 、 String等。大多数搜索应该基于值或值的属性（即对象）而不是密钥的单个组件（例如id1 ）。

Additionally, I will also point out that I disagree with the PartitionResolver Javadoc , bullet #1, where it states, " The key class can implement the PartitionResolver interface to enable custom partitioning ".此外，我还要指出，我不同意PartitionResolver Javadoc项目符号 #1，其中指出“密钥 class 可以实现PartitionResolver接口以启用自定义分区”。 I think this is a naive approach for many reasons, not the least of which is it couples your key class to GemFire.我认为这是一种天真的方法，原因有很多，其中最重要的是它将您的密钥 class 与 GemFire 耦合。 You should always prefer #2 when a PartitionResolver is needed.当需要PartitionResolver时，您应该始终首选 #2。

But is a PartitionResolver actually needed in your case?但是在您的情况下真的需要PartitionResolver吗？

Since your "entire" key defines the "route" (ie all properties [ id1 , id2 , date ] of the Key class), you don't even really need to involve a custom PartitionResolver at all.由于您的“整个”键定义了“路线”（即Key类的所有属性 [ id1 、 id2 、 date ]），因此您甚至根本不需要涉及自定义PartitionResolver 。

All you simply need to do is provide a proper implementation of the Object equals(:Object) and hashCode() methods in your Key class.您只需在您的Key class 中正确实现Object equals(:Object)和hashCode()方法。

TIP: Keep in mind that GemFire Regions at a basic, fundamental level, are simply a java.util.Map , key-value data structure.提示：请记住，基本级别的 GemFire区域只是一个java.util.Map键值数据结构。 Yes, they are distributed (in most cases) as well as partitioned for the PARTITION Regions , but it is fundamentally based on a Map and the "hash" of your key.是的，它们是分布式的（在大多数情况下）以及为 PARTITION Regions分区，但它基本上基于Map和您的密钥的“哈希”。 If your entire key defines the partition (or route), then no custom PartitionResolver is necessary.如果您的整个键定义了分区（或路由），则不需要自定义PartitionResolver 。

TIP: Furthermore, a PARTITION Region is a logical Region that is divided up into 113 buckets (by default, ignoring primaries & secondaries for a moment) and those buckets are distributed across the (data-hosting) servers in your cluster, making the Region physically dispersed, of course, assuming your servers are individual processes on separate machines.提示：此外， PARTITION区域是一个逻辑区域，它分为 113 个存储桶（默认情况下，暂时忽略主存储区和次存储区），这些存储桶分布在集群中的（数据托管）服务器上，从而使该区域当然，假设您的服务器是不同机器上的单独进程，则物理分散。 This is what constitutes a "logical" Region , because to your application, it is simply 1 wholistic data structure.这就是构成“逻辑”区域的原因，因为对于您的应用程序来说，它只是 1 个整体数据结构。 Anyway.反正。

You would implement a custom PartitionResolver if a portion of the key was used to determine the partition (or route) or the key/value pairing.如果键的一部分用于确定分区（或路由）或键/值对，您将实现自定义PartitionResolver 。 This is useful if you want to group certain key/value pairings together, at the same physical location (ie server/process & machine in the cluster).如果您想在同一物理位置（即集群中的服务器/进程和机器）将某些键/值对组合在一起，这很有用。

For example, suppose you want to group similar key/value pairings based on the date of your key.例如，假设您想根据键的date对相似的键/值对进行分组。 Then...然后...

class KeyDatePartitionResolver implements PartitionResolver { 

  public String getName() {
    return getClass().getName();
  }

  public Object getRoutingObject(EntryOperation<Key, Object> entryOp) {
    Key key = entryOp.getKey();
    return key.getDate();
  }
}

Now all entries (key/values) that occurred on a similar date/time would be routed to the same partition (or bucket) in the logical PARTITION Region .现在，发生在相似日期/时间的所有条目（键/值）都将被路由到逻辑 PARTITION Region中的同一分区（或存储桶）。 Of course, you could further filter the date to group, or route the key/value pairings based on year/month/day or simply year/month, however you choose.当然，您可以进一步过滤要分组的日期，或根据年/月/日或简单的年/月路由键/值对，但您可以选择。 Again, all that matters is that the Object returned from the getRoutingObject(..) method in your custom PartitionResolver implements the equals(:Object) and hashCode() methods.同样，重要的是从自定义PartitionResolver中的Object getRoutingObject(..)方法返回的 Object 实现了equals(:Object)和hashCode()方法。 Obviously, Java's java.util.Date class ( Javadoc ) does.显然，Java 的java.util.Date class ( Javadoc ) 可以。

Regarding...关于...

" Is this a right way to create the key index for improving the performance? " “这是为提高性能而创建关键索引的正确方法吗？ ”

Well, it depends on your application search cases.好吧，这取决于您的应用程序搜索案例。 Are your search cases for certain values based on the components (ie [ id1 , id2 , date ]) of the key collectively or individually?您对某些值的搜索案例是集体还是单独基于密钥的组件（即 [ id1 、 id2 、 date ]）？

For example, if you search by the combinations [ id1 , date ] as well as [ id2 , date ] then you would create 2 (KEY) Indexes with these fields from the Key class.例如，如果您通过组合 [ id1 , date ] 以及 [ id2 , date ] 进行搜索，那么您将使用来自Key class 的这些字段创建 2 个（KEY）索引。 If you searched by all 3 fields [ id1 , id2 , date ], then your (KEY) Index would include all 3 fields.如果您按所有 3 个字段 [ id1 、 id2 、 date ] 进行搜索，那么您的 (KEY)索引将包括所有 3 个字段。 If you searched by all 3 combinations, when you would (generally) need all 3 KEY Indexes for optimal performance.如果您通过所有 3 个组合进行搜索，那么（通常）需要所有 3 个 KEY索引以获得最佳性能。

Essentially, a field or combination of fields used in a query predicate expression should be indexed for potentially more optimal performance.本质上，查询谓词表达式中使用的字段或字段组合应该被索引以获得可能更优化的性能。

There is no guarantee though, either.但也不能保证。 Remember, when values change (are added, updated, removed, etc) Indexes need to be updated to some degree.请记住，当值更改（添加、更新、删除等）时，索引需要在某种程度上更新。 Therefore, there are "maintenance costs" associated with Indexes and the more you have, the more it can potentially cost.因此，存在与索引相关的“维护成本”，您拥有的越多，它的潜在成本就越高。

You also have to weigh the benefit between the number of key/value pairings and whether a Index is warranted at all.您还必须权衡键/值对的数量和索引是否有必要之间的好处。 If the data is mostly referential in nature, with a relatively small data set (eg < 1000 entries, perhaps), then sometimes a full scan can still be more efficient in performance than when using Index .如果数据本质上主要是参考数据，数据集相对较小（例如，可能 < 1000 个条目），那么有时完全扫描在性能上仍然比使用Index时更有效。 A full scan is equivalent to a full table scan in an RDBMS.全扫描相当于 RDBMS 中的全表扫描。 Just remember, Indexes are not free.请记住，索引不是免费的。 They take up space (memory) and time (CPU) to maintain.它们占用空间（内存）和时间（CPU）来维护。

I'd also say, it is generally better to (again) use simple keys and maintain "searchable" state in the values associated with the keys.我还要说，通常最好（再次）使用简单的键并在与键关联的值中保持“可搜索”state。 This boils downs to design preference, though.不过，这归结为设计偏好。 Use (simple) keys for partitioning/routing.使用（简单）键进行分区/路由。

For additional (and relevant) information, see: here , here , here , and here .有关其他（和相关）信息，请参阅：此处、此处、此处和此处。

Lastly, regarding Functions , the filter is a set of "keys" ( Javadoc ).最后，关于Functions ，过滤器是一组“键”（ Javadoc ）。 The keys are used to find, or route to the (bucket of the) partition in the logical, PARTITION Region .这些键用于查找或路由到逻辑 PARTITION Region中的（存储桶）分区。

If you also configured a custom PartitionResolver with the PARTITION Region , I believe it will also apply the resolver to the filtered (or set of keys) passed to the Function when the Function is executed.如果您还使用 PARTITION Region配置了自定义PartitionResolver ，我相信它也会将解析器应用于在执行Function时传递给Function的过滤（或一组键）。

But, you are simply passing the entire key, which in your case is an instance of your Key class, where you can pass multiple instances (hence, the " Set ") depending on which keys you want to filter by.但是，您只是传递整个密钥，在您的情况下，它是Key class 的一个实例，您可以在其中传递多个实例（因此，“ Set ”），具体取决于您要过滤的密钥。

Anyway, I hope this all makes sense.无论如何，我希望这一切都有意义。

As always, when these sort of questions or asked, it varies significantly based on your UC (or data access patterns), requirements, data set.与往常一样，当这类问题或被问到时，它会根据您的 UC（或数据访问模式）、要求、数据集而显着变化。 The proper thing to do here is try things and test.在这里做的正确的事情是尝试和测试。

Good luck!祝你好运！

Spring 数据 GemFire 自定义分区和性能

问题描述

1 个解决方案

解决方案1
1 2021-02-12 23:57:24

Spring 数据 GemFire 自定义分区和性能

问题描述

1 个解决方案

解决方案1 1 2021-02-12 23:57:24

解决方案1
1 2021-02-12 23:57:24