多线程-内存不足

Question

I am using an ThreadPoolExecutor with 5 active threads, number of tasks is huge 20,000. 我正在使用具有5个活动线程的ThreadPoolExecutor ，任务数非常庞大，达20,000。
The queue is filled up ( pool.execute(new WorkingThreadTask()) ) with instances of a Runnable tasks almost immediately. 几乎立即用Runnable任务的实例填充了队列（ pool.execute(new WorkingThreadTask()) ）。

Each WorkingThreadTask has a HashMap : 每个WorkingThreadTask都有一个HashMap ：

Map<Integer, HashMap<Integer, String>> themap ;

each map can have up to 2000 items, and each sub-map has 5 items. 每个地图最多可以包含2000个项目，每个子地图最多可以包含5个项目。 There is also a shared BlockingQueue . 还有一个共享的BlockingQueue 。

When process is running I am getting out of memory. 当进程正在运行时，我的内存不足。 I'm running with: (32bit -Xms1024m -Xmx1024m) 我正在使用： (32bit -Xms1024m -Xmx1024m)

How can I handle this problem? 我该如何解决这个问题？ I don't think I have leaks in hashmap... When the thread is finished hashmap is cleaned right? 我认为哈希图中没有泄漏...线程完成后，哈希图被清除了吗？

Update: 更新：

After running a profiler and checking the memory, the biggest hit is: 运行探查器并检查内存后，最大的打击是：

byte[] 2,516,024 hits, 918 MB

I don't know from where it's called or used. 我不知道从何处调用或使用它。

Name    Instance count  Size (bytes)
byte[ ] 2519560 918117496
oracle.jdbc.ttc7.TTCItem    2515402 120739296
char[ ] 357882  15549280
java.lang.String    9677    232248
int[ ]  2128    110976
short[ ]    2097    150024
java.lang.Class 1537    635704
java.util.concurrent.locks.ReentrantLock$NonfairSync    1489    35736
java.util.Hashtable$Entry   1417    34008
java.util.concurrent.ConcurrentHashMap$HashEntry[ ] 1376    22312
java.util.concurrent.ConcurrentHashMap$Segment  1376    44032
java.lang.Object[ ] 1279    60216
java.util.TreeMap$Entry 828 26496
oracle.jdbc.dbaccess.DBItem[ ]  802 10419712
oracle.jdbc.ttc7.v8TTIoac   732 52704

Answer 1

I'm not sure about the inner map but I suspect the problem is that you are creating a large number of tasks that is filling memory. 我不确定内部映射，但是我怀疑问题是您正在创建大量正在填充内存的任务。 You should be using a bounded task queue and limit the job producer. 您应该使用有界任务队列并限制作业生产者。

Take a look at my answer here: Process Large File for HTTP Calls in Java 在这里看看我的答案：使用Java处理大型文件以进行HTTP调用

To summarize it, you should create your own bounded queue and then use a RejectedExecutionHandler to block the producer until there is space in the queue. 概括起来，您应该创建自己的有界队列，然后使用RejectedExecutionHandler阻止生产者，直到队列中没有空间。 Something like: 就像是：

final BlockingQueue<WorkingThreadTask> queue =
    new ArrayBlockingQueue<WorkingThreadTask>(100);
ThreadPoolExecutor threadPool =
    new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
       @Override
       public void rejectedExecution(WorkingThreadTask task,
             ThreadPoolExecutor executor) {
           try {
                // this will block the producer until there's room in the queue
                executor.getQueue().put(task);
           } catch (InterruptedException e) {
                throw new RejectedExecutionException(
                   "Unexpected InterruptedException", e);
           }
    }
});

Edit: 编辑：

I don't think I have leeks in hashmap... when thread is finished hashmap is cleaned right? 我认为哈希表中没有韭菜...线程完成后，哈希表被清除了吗？

You might consider aggressively calling clear() on the work HashMap and other collections when the task completes. 您可以考虑在任务完成时主动在工作HashMap和其他集合上调用clear() 。 Although they should be reaped by the GC eventually, giving the GC some help may solve your problem if you have limited memory. 尽管最终应该由GC来获得它们，但是如果内存有限，给GC一些帮助可以解决您的问题。

If this doesn't work, a profiler is the way to go to help you identify where the memory is being held. 如果这不起作用，则可以使用探查器来帮助您确定内存的存放位置。

Edit: 编辑：

After looking at the profiler output, the byte[] is interesting. 查看探查器输出后， byte[]很有趣。 Typically this indicates some sort of serialization or other IO. 通常，这表示某种序列化或其他IO。 You may also be storing blobs in a database. 您可能还将blob存储在数据库中。 The oracle.jdbc.ttc7.TTCItem is very interesting however. 但是， oracle.jdbc.ttc7.TTCItem 非常有趣。 That indicates to me that you are not closing a database connection somewhere. 这向我表明您没有在某个地方关闭数据库连接。 Make sure to use proper try/finally blocks to close your connections. 确保使用正确的try / finally块关闭连接。

Answer 2

HashMap carries quite a lot of overhead in terms of memory usage..... it carries about 36 bytes minimum per entry, plus the size of the key/value itself - each will be at least 32 bytes (I think that's about the typical value for 32-bit sun JVM).... doing some quick math: HashMap在内存使用方面要承担很多开销.....每个条目至少要包含36个字节，再加上键/值本身的大小-每个都至少要32个字节（我认为这是典型的32位sun JVM的值）。...做一些快速数学运算：

20,000 tasks, each with map with 2000 entry hashmap. The value in the map is another map with 5 entries.
->  5-entry map is 1* Map + 5* Map.Object entries + 5*keys + 5*values = 16 objects at 32 bytes => 512 bytes per sub-map.
->  2000 entry map is 1* Map, 2000*Map.Object + 2000 keys + 2000 submaps (each is 512 bytes) => 2000*(512+32+32) + 32 => 1.1MB
->  20,000 tasks, each of 1.1MB -> 23GB

So, your overall footprint is 23GB. 因此，您的总体占用空间为23GB。

The logical solution is to restrict the depth of your blocking queue feeding the ExecutorService, and only create enough child tasks to keep it busy..... set a limit of about 64 entries in the queue, and then you will never have more than 64 + 5 tasks instantiated at one time. 合理的解决方案是限制向ExecutorService馈送的阻塞队列的深度，并仅创建足够的子任务以使其保持繁忙.....在队列中设置大约64个条目的限制，这样您将永远不会超过一次实例化64 + 5个任务。 When wpace comes available in the executor's queue, you can create and add another task. 当wpace在执行者的队列中可用时，您可以创建并添加另一个任务。

Answer 3

You can improve the efficiency by not adding so many tasks ahead of what is being processed. 您可以通过在正在处理的任务之前不添加太多任务来提高效率。 Try checking the queue and only adding to it if there is less than 1000 entries. 尝试检查队列，仅在条目少于1000时才添加到队列。

You can also make the data structures more efficient. 您还可以使数据结构更高效。 A Map with an Integer key can often be reduced to an array of some kind. 具有Integer键的Map通常可以简化为某种数组。

Lastly, 1 GB isn't that much these days. 最后，这些天来1 GB的容量还不够。 My mobile phone has 2 GB. 我的手机有2 GB。 If you are going to process large amount of data, I suggest getting a machine with 32-64 GB of memory and a 64-bit JVM. 如果您要处理大量数据，建议您购买一台具有32-64 GB内存和64位JVM的计算机。

Answer 4

From the large byte[] s, I'd suspect IO related issues (unless you are handling video/audio or something). 从大byte[] ，我怀疑IO相关问题（除非您正在处理视频/音频等）。

Things to look at: 要看的东西：

DB: Are you trying to read large amount of stuff at once? DB：您是否想一次阅读大量内容？ You can eg use a cursor to not do that 您可以例如使用光标不这样做
File/Network: Are you trying to read large amounts of stuff from file/network at once? 文件/网络：您是否正在尝试一次从文件/网络读取大量内容？ You should "propagate the load" to whatever is reading and regulate the rate of read. 您应该“传播负载”到正在读取的任何内容，并调节读取速率。

UPDATE : OK, so you are using a cursor to read from DB. 更新：好的，因此您正在使用游标从数据库读取。 Now you need to make sure that the reading from the cursor only progresses as you finish stuff (aka "propagate the load"). 现在，您需要确保从光标读取的内容仅在完成内容时才进行（也称为“传播负载”）。 To do this, use a thread pool like this: 为此，请使用如下线程池：

 BlockingQueue<Runnable> queue = new LinkedBlockingQueue<Runnable>(queueSize);
 ThreadPoolExecutor tpe = new ThreadPoolExecutor(
                    threadNum,
                    threadNum,
                    1000,
                    TimeUnit.HOURS,
                    queue,
                    new ThreadPoolExecutor.CallerRunsPolicy());

Now when you post to this service from your code which reads from the DB, it will block when the queue is full (the calling thread is used to run tasks and hence blocks). 现在，当您从从数据库读取的代码中将该服务发布到该服务时，当队列已满时它将阻塞（调用线程用于运行任务，因此会阻塞）。

多线程-内存不足

问题描述

4 个解决方案

解决方案1
1 2013-09-08 15:39:09

解决方案2
0 2013-09-08 15:47:50

解决方案3
0 2013-09-08 16:05:21

解决方案4
0 2013-09-09 07:40:46

多线程-内存不足

问题描述

4 个解决方案

解决方案1 1 2013-09-08 15:39:09

解决方案2 0 2013-09-08 15:47:50

解决方案3 0 2013-09-08 16:05:21

解决方案4 0 2013-09-09 07:40:46

解决方案1
1 2013-09-08 15:39:09

解决方案2
0 2013-09-08 15:47:50

解决方案3
0 2013-09-08 16:05:21

解决方案4
0 2013-09-09 07:40:46