简体繁体 English

使用并发hashmap来减少线程池的内存使用量？

[英]Using a concurrent hashmap to reduce memory usage with threadpool?

原文 2011-08-31 14:51:36 2 3 java/ memory-management/ hashmap/ threadpool/ concurrenthashmap

I'm working with a program that runs lengthy SQL queries and stores the processed results in a HashMap. 我正在使用一个运行冗长SQL查询的程序，并将处理后的结果存储在HashMap中。 Currently, to get around the slow execution time of each of the 20-200 queries, I am using a fixed thread pool and a custom callable to do the searching. 目前，为了解决每个20-200个查询的缓慢执行时间，我使用固定线程池和自定义可调用来进行搜索。 As a result, each callable is creating a local copy of the data which it then returns to the main program to be included in the report. 因此，每个可调用对象都会创建数据的本地副本，然后将其返回到主程序以包含在报告中。

I've noticed that 100 query reports, which used to run without issue, now cause me to run out of memory. 我注意到100个查询报告，以前运行没有问题，现在导致我内存不足。 My speculation is that because these callables are creating their own copy of the data, I'm doubling memory usage when I join them into another large HashMap. 我的推测是，因为这些callables正在创建自己的数据副本，所以当我将它们加入另一个大型HashMap时，我的内存使用量会增加一倍。 I realize I could try to coax the garbage collector to run by attempting to reduce the scope of the callable's table, but that level of restructuring is not really what I want to do if it's possible to avoid. 我意识到我可以尝试通过尝试减少可调用表的范围来哄骗垃圾收集器运行，但是如果可以避免的话，那个重组级别并不是我想要做的。

Could I improve memory usage by replacing the callables with runnables that instead of storing the data, write it to a concurrent HashMap? 我可以通过用runnable替换callables而不是存储数据来改进内存使用，将它写入并发HashMap吗？ Or does it sound like I have some other problem here? 或者听起来我在这里有其他问题？

3 个解决方案

Don't create copy of data, just pass references around, ensuring thread safety if needed. 不要创建数据副本，只需传递引用，如果需要，确保线程安全。 If without data copying you still have OOM, consider increasing max available heap for application. 如果没有数据复制，您仍然有OOM，请考虑增加应用程序的最大可用堆。

Drawback of above approach not using copy of data is that thread safety is harder to achieve, though. 不使用数据副本的上述方法的缺点是线程安全性更难以实现。

Do you really need all 100-200 reports at the same time? 你真的需要同时获得所有100-200份报告吗？

May be it's worth to limit the 1st level of caching by just 50 reports and introduce a 2nd level based on WeakHashMap ? 可能仅限50个报告限制第一级缓存并引入基于WeakHashMap的第二级？ When 1st level exceeds its size LRU will be pushed to the 2nd level which will depend on the amount of available memory (with use of WeakHashMap ). 当第一级超过其大小时，LRU将被推到第二级，这将取决于可用内存量（使用WeakHashMap ）。

Then to search for reports you will first need to query 1st level, if value is not there query 2nd level and if value is not there then report was reclaimed by GC when there was not enough memory and you have to query DB again for this report. 然后要搜索报告，首先需要查询第一级，如果值不存在查询第二级，如果值不存在那么当没有足够的内存时，GC会回收报告，并且您必须再次查询DB以获取此报告。

Do the results of the queries depend on other query results? 查询结果是否依赖于其他查询结果？ If not, whenever you discover the results in another thread, just use a ConcurrentHashMap like you are implying. 如果没有，每当你在另一个线程中发现结果时，只需使用你所暗示的ConcurrentHashMap。 Do you really need to ask if creating several unnecessary copies of data is causing your program to run out of memory? 你真的需要问一下，创建几个不必要的数据副本是否会导致程序内存不足？ This should almost be obvious. 这应该是显而易见的。