简体繁体 English

适用于Android的内存友好型快速键值访问解决方案

[英]Memory Friendly Fast Key-Value Access Solution for Android

原文 2012-05-15 21:03:37 9 3 java/ android/ database/ memory/ map

I have an Android application that iterates through an array of thousands of integers and it uses them as key values to access pairs of integers (let us call them id's) in order to make calculations with them. 我有一个Android应用程序，它遍历成千上万个整数数组，并且将它们用作键值来访问成对的整数（让我们称它们为id），以便使用它们进行计算。 It needs to do it as fast as possible and in the end, it returns a result which is crucial to the application. 它需要尽快执行此操作，最后，它返回对应用程序至关重要的结果。

I tried loading a HashMap into the memory for fast access to those numbers but it resulted in OOM Exception. 我尝试将HashMap加载到内存中以快速访问这些数字，但是这导致OOM异常。 I also tried writing those id's to a RandomAccessFile and storing their offsets on the file to another HashMap but it was way too slow. 我还尝试将这些ID写入RandomAccessFile并将其偏移量存储在文件中到另一个HashMap，但这太慢了。 Also, the new HashMap that only stores the offsets is still occupying a large memory. 此外，仅存储偏移量的新HashMap仍占用大量内存。

Now I am considering SQLite but I am not sure if it will be any faster. 现在，我正在考虑使用SQLite，但不确定是否会更快。 Are there any structures or libraries that could help me with that? 有没有可以帮助我解决这些问题的结构或库？

EDIT: Number of keys are more than 20 million whereas I only need to access thousands of them. 编辑：键的数量超过2000万，而我只需要访问其中的数千个。 I do not know which ones I will access beforehand because it changes with user input. 我不知道我会事先访问哪个，因为它随用户输入而变化。

3 个解决方案

You could use Trove's TIntLongHashMap to map primitive int s to primitive long s (which store the int s of your value pair). 您可以使用Trove的TIntLongHashMap将原始int映射到原始long s（存储值对的int ）。 This saves you the object overhead of a plain vanilla Map , which forces you to use wrapper types. 这为您节省了普通香草Map的对象开销，这迫使您使用包装器类型。

EDIT 编辑

Since your update states you have more than 20 million mappings, there will likely be more space-efficient structures than a hash map. 由于您的更新表明您有超过2000万个映射，因此比散列表更节省空间的结构。 An approach to partition your keys into buckets, combined with some sub-key compression will likely save you half the memory over even the most efficient hash map implementation. 将密钥划分为存储桶的方法，再加上一些子密钥压缩，即使是最有效的哈希映射实现，也可能节省一半的内存。

SQLite is an embedded relational database, which uses indexes. SQLite是使用索引的嵌入式关系数据库。 I would bet it is much faster than using RandomAccessFile. 我敢打赌，它比使用RandomAccessFile快得多。 You can give it a try. 您可以尝试一下。

My suggestion is to rearrange the keys in Buckets - what i mean is identify (more or less) the distribution of your keys, then make files that corresponds to each range of keys (the point is that every file must contain just as much integers that can get in memory and no more then that) then when you search for a key, you just read the whole file to the memory and look for it. 我的建议是重新排列Buckets中的键-我的意思是（或多或少）确定键的分布，然后制作与每个键范围相对应的文件（关键是每个文件必须包含与可以进入内存，仅此而已）。然后，当您搜索密钥时，您只需将整个文件读到内存中并寻找它。

exemple, assuming the distribution of the key is uniform, store 500k values corresponding to the 0-500k key values, 500k values corresponding to 500k-1mil keys and so on... 例如，假设键的分布是均匀的，则存储对应于0-500k键值的500k值，对应于500k-1mil键的500k值，依此类推...

EDIT : if you did try this approach, and it still went slow, i still have some tricks in my sleaves: 编辑：如果您确实尝试过这种方法，但它仍然很慢，我的技巧仍然有些不足：

First make sure that your division is actually close to equal between all the buckets. 首先，请确保您在所有存储分区之间的划分实际上接近相等。
Try to make the buckets smaller, by making more buckets. 尝试通过制造更多的水桶来使水桶更小。
The idea about correct division to buckets by ranges is that when you search for a key, you go to the corresponding range bucket and The key either in it or that it is not in the whole collection. 关于按范围对存储桶进行正确划分的想法是，当您搜索某个键时，您将转到相应的范围存储桶，并且该键在其中或不在整个集合中。 so there is no point on Concurnetly reading another bucket. 因此，在Concurnetly读取另一个存储桶上没有任何意义。
I never done that, cause im not sure concurrency works on I\\O's, but it may be helpfull to Read the whole file with 2 threads one from top to bottom and the other from bottom to top until they meet. 我从来没有做过，因为我不确定并发是否可以在I \\ O上使用，但是使用两个线程（从上到下，另一个从下到上）读取整个文件可能会有所帮助，直到它们满足为止。 (or something like that) （或类似的东西）
While you read the whole bucket into memory, split it to 3-4 arraylists, Run 3-4 working threads to search your key on each of the arrays, the search must end way faster then. 当您将整个存储桶读取到内存中时，将其拆分为3-4个数组列表，运行3-4个工作线程以在每个数组上搜索您的密钥，搜索必须以更快的速度结束。