简体繁体 English

（Java）使用索引属性存储大量对象

[英](Java) Store a huge collection of objects with indexed attributes

原文 2010-07-25 16:05:30 2 4 java/ optimization/ collections

I need to store about 100 thousands of objects representing users. 我需要存储大约100万个代表用户的对象。 Those users have a username, age, gender, city and country. 这些用户拥有用户名，年龄，性别，城市和国家/地区。

The users should be searchable by a range of age and any of the other attributes, but also a combination of attributes (eg women between 30 and 35 from Brussels). 用户应该可以通过一系列年龄和任何其他属性进行搜索，还可以搜索属性的组合（例如布鲁塞尔30至35岁的女性）。 The results should be found quickly as it is one of the Server's services for many connected Clients). 应该快速找到结果，因为它是许多连接客户端的服务器服务之一）。 Users may only be deleted or added, not updated. 用户只能被删除或添加，不能更新。

I've thought of a fast database with indexed attributes (like h2 db which seems to be pretty fast, and I've seen they have a in-memory mode) 我已经想到了一个带有索引属性的快速数据库（比如h2 db，它看起来非常快，我看到它们有一个内存模式）

I was wondering if any other option was possible before going for the DB. 我想知道在去DB之前是否还有其他选择。

Thank you for any ideas ! 谢谢你的任何想法！

4 个解决方案

How much memory does your server have? 你的服务器有多少内存？ How much memory would these objects take up? 这些对象会占用多少内存？ Is it feasible to keep them all in memory, or not? 将它们全部留在记忆中是否可行？ Do you really need the speedup of keeping in memory, vs shoving in a database? 你是否真的需要保持内存的速度，而不是在数据库中推？ It does make it more complex to keep in memory, and it does increase hardware requirements... are you sure you need it? 它确实使内存更加复杂，并且确实增加了硬件要求......你确定需要它吗？

Because all of what you describe could be ran on a very simple server and put in a very simple database and give you the results you want in the order of 100ms per request. 因为您描述的所有内容都可以在一个非常简单的服务器上运行并放在一个非常简单的数据库中，并按每个请求100ms的顺序为您提供所需的结果。 Do you need faster than 100ms response time? 您需要超过100毫秒的响应时间吗？ Why? 为什么？

I would use a RDBMS - there are plenty of good ORMs available, such as Hibernate , which allow you to transparently stuff the POJOs into a db. 我会使用RDBMS - 有很多可用的好的ORM，比如Hibernate ，它允许你透明地将POJO填充到数据库中。 Once you've got the data access abstracted, you then have the freedom to decide how best to persist the data. 一旦您获得了抽象的数据访问权限，您就可以自由决定如何最好地保留数据。

For this size of project, I would use the H2 database . 对于这个大小的项目，我会使用H2数据库。 It has both embedded and client/server modes, and can operate from disk or entirely in memory. 它具有嵌入式和客户端/服务器模式，可以从磁盘或完全在内存中运行。

Most definitely a relational database. 绝对是关系数据库。 With that size you'll want a client-server system, not something embedded like Sqlite. 使用这个大小，你需要一个客户端 - 服务器系统，而不是像Sqlite那样嵌入的东西。 Pick one system depending on further requirements. 根据进一步的要求选择一个系统。 Indexing is a basic feature, most systems support it. 索引是一项基本功能，大多数系统都支持它。 Personally I'd try something that's popular and free such as MySQL or PostgreSQL so you can more easily google your way out of problems. 就个人而言，我会尝试一些流行且免费的东西，例如MySQL或PostgreSQL，这样你就可以更轻松地以自己的方式解决问题了。 If you make your SQL queries generic enough (no vendor-specific constructs), you can switch systems without much pain. 如果您使SQL查询足够通用（没有特定于供应商的构造），您可以毫不费力地切换系统。 I agree with bwawok, try whether a standard setup is good enough and think of optimizations later. 我同意bwawok，尝试标准设置是否足够好并考虑稍后进行优化。

Did you think to use cache system like EHCache or Memcached? 你有没有想过使用像EHCache或Memcached这样的缓存系统？ Also If you have enough memory you can use some sorted collection like TreeMap as index map, or HashMap to search user by name (separate Map per field). 此外，如果您有足够的内存，您可以使用一些排序的集合，如TreeMap作为索引映射，或使用HashMap按名称搜索用户（每个字段单独的映射）。 It will take more memory but can be effective. 这将需要更多的记忆，但可以有效。 Also you can find based on the user query experience the most frequently used query with the best selectivity and create comparator based on this query onli. 您还可以根据用户查询经验找到具有最佳选择性的最常用查询，并根据此查询创建比较器。 In this case subset of the element will not be a big and can can be filter fast without any additional optimization. 在这种情况下，元素的子集将不会很大，并且可以快速过滤而无需任何额外的优化。