简体   繁体   English

将大数据对象缓存到 Hazlecast 的最佳方法是什么

[英]What is the best way to cache large data objects into Hazlecast

We have around 20k merchants data,size around 3mb If we cache these much data together then hazlecast performance not doing good Please note if we cache all 20k individual then for get all merchants call slowing down as reading each merchant from cache costs high network time.我们有大约 20k 商家数据,大小约为 3mb 如果我们将这些大量数据缓存在一起,那么 hazlecast 性能不佳请注意,如果我们缓存所有 20k 个人,那么为了让所有商家调用速度变慢,因为从缓存中读取每个商家会花费大量网络时间。 How should we partition these data What will be the partition key What will be the max size per partition我们应该如何对这些数据进行分区 什么是分区键 每个分区的最大大小是多少

Merchant entity attributed as below Merchant Id, parent merchant id, name, address, contacts, status, type商户实体属性如下商户 ID、父商户 ID、名称、地址、联系人、状态、类型

Merchant id is the unique attribute商家 id 是唯一属性

Please suggest请建议

Adding to what Mike said, it's not unusual to see Hazelcast maps with millions of entries, so I wouldn't be concerned with the number of entries.除了迈克所说的,看到有数百万条目的 Hazelcast 地图并不罕见,所以我不会关心条目的数量。

You should structure your map(s) to fit your applications design needs.您应该构建您的地图以适应您的应用程序设计需求。 Doing a 'getAll' on a single map seems inefficient to me.在单个 map 上执行“getAll”对我来说似乎效率低下。 It may make more sense to create multiple maps or use a complex key that allows you to be more selective with entries returned.创建多个映射或使用允许您对返回的条目更具选择性的复杂键可能更有意义。

Also, you may want to look at indexes.此外,您可能想查看索引。 You can index the key and/or value which can really help with performance.您可以索引真正有助于提高性能的键和/或值。 Predicates you construct for selections will automatically use any defined indexes.您为选择构造的谓词将自动使用任何已定义的索引。

I wouldn't worry about changing partition key unless you have reason to believe the default partitioning scheme is not giving you a good distribution of keys.我不会担心更改分区键,除非您有理由相信默认分区方案没有为您提供良好的键分布。

With 20K merchants and 3MB of data per merchant, your total data is around 60GB.拥有 20K 商家和每个商家 3MB 的数据,您的总数据约为 60GB。 How many nodes are you using for your cache, and what memory size per node?您将多少个节点用于缓存,每个节点的 memory 大小是多少? Distributing the cache across a larger number of nodes should give you more effective bandwidth.将缓存分布在大量节点上应该可以为您提供更有效的带宽。

Make sure you're using an efficient serialization mechanism, the default Java serialization is very inefficient (both in terms of object size and speed to serialize and deserialize);确保您使用的是高效的序列化机制,默认的 Java 序列化效率非常低(无论是在 object 大小和序列化和反序列化速度方面); using something like IdentifiedDataSerializable (if Java) or Portable (if using non-Java clients) could help a lot.使用 IdentifiedDataSerializable(如果是 Java)或 Portable(如果使用非 Java 客户端)之类的东西会有很大帮助。

I would strongly recommend that you break down your object from 3MB to few 10s of KBs, otherwise you will run into problems that are not particularly related to Hazelcast.我强烈建议您将 object 从 3MB 分解为几十 KB,否则您将遇到与 Hazelcast 无关的问题。 For example, fat packets blocking other packets resulting in heavy latency in read/write operations, heavy serialization/deserialization overhead, choked network etc. You have already identified high network time and it is not going to go away without flattening the value object.例如,胖数据包阻塞了其他数据包,导致读/写操作中的大量延迟、大量的序列化/反序列化开销、阻塞网络等。您已经确定了高网络时间,如果不压平值 object,它不会离开 go。 If yours is read heavy use case then I also suggest to look into NearCache for ultra low latency read operations.如果您的用例是读取量大的用例,那么我还建议研究 NearCache 以实现超低延迟读取操作。

As for partition size, keep it under 100MB, I'd say between 50-100MB per partition.至于分区大小,请保持在 100MB 以下,我会说每个分区在 50-100MB 之间。 Simple maths will help you:简单的数学将帮助您:

3mb/object x 20k objects = 60GB
Default partition count = 271
Each partition size = 60,000 MB / 271 = 221MB. 
So increasing the partition count to, lets say, 751 will mean:
60,000 MB / 751 = 80MB.

So you can go with partition count set to 751. To cater to possible increase in future traffic, I'd set the partition count to an even higher number - 881.因此,您可以将分区计数设置为 751 的 go。为了满足未来可能增加的流量,我将分区计数设置为更高的数字 - 881。

Note: Always use a prime number for partition count.注意:始终使用素数进行分区计数。

Fyi - in one of the future releases, the default partition count will be changed from 271 to 1999.仅供参考 - 在未来的某个版本中,默认分区数将从 271 更改为 1999。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取非常大的 csv 文件并将数据解析为对象的最佳方法是什么? - What is the best way to read a very large csv file and parse the data into objects? 在Java中缓存和重用不可变单例对象的最佳方法是什么? - What is the best way to cache and reuse immutable singleton objects in Java? 大量数据 - 发送它们的最佳方式是什么? - Large amount of data - what is the best way to send them? 从非常大的结果集中显示数据的最佳方法是什么? - What is the best way to present data from a very large resultset? 在Java中加载和缓存图像的最佳方法是什么? - What is the best way to load and cache images in Java? 处理大型CSV文件的最佳方法是什么? - What is the best way to process large CSV files? 从List中删除对象的最佳方法是什么 - What is the best way to remove objects from a List 在Google App Engine中使用Java,什么是存储和访问大型静态数据的最佳方法? - Using Java in Google App Engine, what's the best way to store and access large, static data? 什么是从Oracle数据库获取大量数据到Java对象的最快方法 - what's the fastest way to get a large volume of data from an Oracle database into Java objects 用Java在磁盘上存储对象的最佳方法是什么? - What is the best way to store objects on disk in java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM