简体繁体 English

什么时候BIG，足够大的数据库？

[英]When is BIG, big enough for a database?

原文 2011-01-11 11:52:14 9 6 java/ database/ sqlite/ hashmap

I'm developing a Java application that has performance at its core. 我正在开发一个以性能为核心的Java应用程序。 I have a list of some 40,000 "final" objects, ie, I have an initialization input data of 40,000 vectors. 我有一个大约40,000个“最终”对象的列表，即我有一个40,000个向量的初始化输入数据。 This data is unchanged throughout the program's run. 在整个程序运行期间，此数据保持不变。

I am always preforming lookups against a single ID property to retrieve the proper vectors. 我总是针对单个ID属性执行查找以检索正确的向量。 Currently I am using a HashMap over a sub-sample of a 1,000 vectors, but I'm not sure it will scale to production. 目前我在1000个向量的子样本上使用HashMap ，但我不确定它是否会扩展到生产。

When is BIG, actually big enough for a use of DB? 什么时候BIG，实际上足够大，可以使用DB？ One more thing, an SQLite DB is a viable option as no concurrency is involved, so I guess the "threshold" for db use, is perhaps lower. 还有一件事，SQLite DB是一个可行的选择，因为不涉及并发，所以我猜数据库使用的“阈值”可能更低。

6 个解决方案

I think you're asking whether a HashMap with 40,000 entries in will be okay. 我想你在问一个有40,000个条目的HashMap是否HashMap 。 The answer is yes - unless you really don't have enough memory, that should be absolutely fine. 答案是肯定的 - 除非你真的没有足够的记忆，否则这应该是绝对正确的。 If you're writing a performance-sensitive app, then putting a large amount of fast memory in the machine running the app is likely to be an efficient way of boosting performance anyway. 如果您正在编写性能敏感的应用程序，那么在运行应用程序的计算机中放入大量快速内存可能是提高性能的有效方法。

There won't be very much overhead for each HashMap entry, so if you've got enough space to store the objects themselves in memory, it's unlikely that the overhead of the map would cause a problem. 每个HashMap条目都不会有很多开销，所以如果你有足够的空间将对象本身存储在内存中，那么地图的开销不太可能导致问题。

Is there any reason why you can't just test this with a reasonable amount of data? 你有什么理由不能用合理数量的数据测试这个吗？

If you really have no more requirements than: 如果你真的没有比以下更多的要求：

Read data at start-up 启动时读取数据
Put data in a map by a single ID (no need for joins, queries against different fields, substring matches etc) 通过单个ID将数据放入映射中（不需要连接，针对不同字段的查询，子字符串匹配等）
Fetch data from map 从地图中获取数据

... then using a full-blown database would be a huge amount of overkill, IMO. ...然后使用一个完整的数据库将是一个巨大的过度杀伤，IMO。

只要你在程序开头的内存中加载数据集并将其保存在内存中并且没有任何复杂的查询，某种序列化/反序列化似乎比完整的数据库更可行。。

You could start a DB with as little as 100 (or less). 您可以启动一个只有100（或更少）的数据库。 There is no general rule of when the amount of data is large enough to store in a database. 当数据量足够大以存储在数据库中时，没有一般规则。 It's more if you believe you should better store this data in a database, if this will give you any profit (performance boost, easier programming, more flexible options for your users). 如果你认为你应该更好地将这些数据存储在数据库中，那就更好了，如果这会给你带来任何利润（性能提升，编程更容易，用户选择更灵活）。

When the benefits are greater than the cost of implementation put it in a database. 当收益大于实施成本时，将其放入数据库中。

There is no set size for a Collection vs a Database. Collection与数据库没有设置大小。 It high depends on what you want to do with the data. 它取决于您想要对数据做什么。 Size is less important. 尺寸不太重要。

You can have a Map with a billion entries. 您可以拥有包含十亿条目的地图。

There's no such thing as 'big enough for a database'. 没有“足够大的数据库”这样的东西。 The question is whether there are enough advantages in using a database to overcome the costs. 问题是使用数据库是否有足够的优势来克服成本。

Having said that, 40,000 isn't 'big' ;-) Unless the objects are huge or you have complex query requirements I would start with an in-memory implementation. 话虽如此，40,000不是'大';-)除非对象很大或你有复杂的查询要求，我会从内存实现开始。 But if you expect to scale this number up over time it might be better to use the database from the beginning. 但是，如果您希望随着时间的推移扩大此数字，那么从一开始就使用数据库可能会更好。

One option that you might want to consider is the Oracle Berkeley DB Java Edition library. 您可能需要考虑的一个选项是Oracle Berkeley DB Java版库。 It's a simple JAR file that can read/write data to persistent storage. 它是一个简单的JAR文件，可以读取/写入持久存储的数据。 Because of it's small footprint and ease of use, it's used for applications running on small to very large data sets. 由于它占地面积小，易于使用，因此可用于在小型到大型数据集上运行的应用程序。 It's designed to be linked into the application, so that it's embedded and doesn't require complex client/server installation or protocol stacks. 它被设计为链接到应用程序，因此它是嵌入式的，不需要复杂的客户端/服务器安装或协议栈。

What's even better is that it's extremely scalable (which works well if you end up with larger data sets than you expect), is very fast, and supports both a Java Collections API and a Direct Persistence Layer API (POJO-like). 更好的是它具有极高的可扩展性（如果最终得到的数据集超出预期，效果很好），速度非常快，并且同时支持Java Collections API和Direct Persistence Layer API（类似POJO）。 So you can use it seamlessly with Java Collections. 因此，您可以将它与Java Collections无缝地结合使用。

Berkeley DB Java Edition was designed specifically with Java application developers in mind. Berkeley DB Java Edition专为Java应用程序开发人员而设计。 It's designed to be simple to use, light weight in terms of resources required, but very fast, scalable and reliable. 它的设计易于使用，在所需资源方面重量轻，但速度快，可扩展且可靠。

You can find information more about Oracle Berkeley DB Java Edition here 您可以在此处找到有关Oracle Berkeley DB Java Edition的更多信息

Regards, 问候，

Dave 戴夫