简体繁体 English

如何在clojure哈希表上创建快速访问索引？

[英]How can I create an index for fast access on a clojure hash table?

原文 2010-12-30 11:23:51 1 2 clojure

I wish to store many records in a clojure hash table. 我希望在clojure哈希表中存储许多记录。 If I wish to get fast access to certain records using a certain field or range query then what options do I have, without having to resort to storing the data in a database (which is where the data came from in the first place). 如果我希望使用某个字段或范围查询快速访问某些记录，那么我有什么选项，而不必求助于将数据存储在数据库中（数据首先来自数据库）。

I guess I'm also wondering whether an STM is the right place for a large indexed data set as well. 我想我也想知道STM是否也是大型索引数据集的正确位置。

2 个解决方案

Depending how far you want to push this, you're asking to build an in-memory database. 根据您想要推动的程度，您需要构建一个内存数据库。 I assume you don't actually want to do that or presumably use one of the many in-memory Java databases that already exist ( Derby , H2 , etc). 我假设您实际上并不想这样做，或者可能使用已经存在的许多内存中Java数据库中的一个（ Derby ， H2等）。

If you want indexed or range access to multiple attributes of your data, then you need to create all of those indexes in Clojure data structures. 如果要对数据的多个属性进行索引或范围访问，则需要在Clojure数据结构中创建所有这些索引。 Clojure maps will give you O(log32 n) time access to data (worse than constant, but still very bounded). Clojure映射将为您提供O（log32 n）时间访问数据（比常数更差，但仍然非常有限）。 If you need better than that, you can use Java maps like HashMap or ConcurrentHashMap directly with the caveat that you're outside the Clojure data model. 如果你需要更好的东西，可以直接使用HashMap或ConcurrentHashMap等Java地图，以及你在Clojure数据模型之外的警告。 For range access, you'll want some sort of sorted tree data structure... Java has ConcurentSkipListMap which is pretty great for what it does. 对于范围访问，您将需要某种排序的树数据结构... Java具有ConcurentSkipListMap ，它非常适合它的功能。 If that's not good enough, you might need your own btree impl. 如果这还不够好，您可能需要自己的btree impl。

If you're not changing this data, then Clojure's STM is immaterial. 如果你没有改变这些数据，那么Clojure的STM就不重要了。 Is this data treated as a cache of a subset of the database? 此数据是否被视为数据库子集的缓存？ If so, you might consider using a cache library like Ehcache instead (they've recently added support for very large off-heap caches and search capabilities). 如果是这样，您可能会考虑使用像Ehcache这样的缓存库（他们最近添加了对非常大的堆外缓存和搜索功能的支持）。

Balancing data between in-memory cache and persistent store is a tricky business and one of the most important things to get right in data-heavy apps. 在内存缓存和持久存储之间平衡数据是一项棘手的业务，也是在数据量最大的应用程序中实现最重要的事情之一。

You'll probably want to create separate indexes for each field using a sorted-map so that you can do range queries. 您可能希望使用排序映射为每个字段创建单独的索引，以便您可以执行范围查询。 Under the hood this uses something like a persistent version of a Java TreeMap. 在引擎盖下，它使用类似Java TreeMap的持久版本。

STM shouldn't be an issue if you are mostly interested in read access. 如果您对读取访问感兴趣，则STM不应成为问题。 In fact it might even prove better than mutable tables since: 事实上它甚至可能比可变表更好，因为：