简体   繁体   中英

What do you do when you need more Java Heap Space?

Sorry if this has been asked before (though I can't really find a solution).

I'm not really too good at programming, but anyways, I am crawling a bunch of websites and storing information about them on a server. I need a java program to process vector coordinates associated with each of the documents (about a billion or so documents with a grant total of 500,000 numbers, plus or minus, associated with each of the documents). I need to calculate the singular value decomposition of that whole matrix.

Now Java, obviously, can't handle as big of a matrix as that to my knowledge. If i try making a relatively small array (about 44 million big) then I will get a heap error. I use eclipse, and so I tried changing the -xmx value to 1024m (it won't go any higher for some reason even though I have a computer with 8gb of ram).

What solution is there to this? Another way of retrieving the data I need? Calculating the SVD in a different way? Using a different programming language to do this?

EDIT: Just for right now, pretend there are a billion entries with 3 words associated with each. I am setting the Xmx and Xms correctly (from run configurations in eclipse -> this is the equivalent to running java -XmsXXXX -XmxXXXX ...... in command prompt)

The Java heap space can be set with the -Xmx (note the initial capital X ) option and it can certainly reach far more than 1 GB, provided you are using an 64-bit JVM and the corresponding physical memory is available. You should try something along the lines of:

java -Xmx6144m ...

That said, you need to reconsider your design. There is a significant space cost associated with each object, with a typical minimum somewhere around 12 to 16 bytes per object, depending on your JVM. For example, a String has an overhead of about 36-40 bytes...

Even with a single object per document with no book-keeping overhead (impossible!), you just do not have the memory for 1 billion (1,000,000,000) documents. Even for a single int per document you need about 4 GB.

You should re-design your application to make use of any sparseness in the matrix, and possibly to make use of disk-based storage when possible. Having everything in memory is nice, but not always possible...

Are you using a 32 bit JVM ? These cannot have more than 2 GB of Heap, I never managed to allocate more than 1.5 GB. Instead, use a 64 bit JVM , as these can allocate much more Heap.

Or you could apply some math to it and use divide and conquer strategy. This means, split the problem into little problems to get to the same result.

Don't know much about SVD but maybe this page can be helpful:

http://www.netlib.org/lapack/lug/node32.html

-Xms and -Xmx are different. The one containg s is the starting heap space and the one with x is the maximum heap space.

so

java -Xms512 -Xmx1024

would give you 512 to start with

As other people have said though you may need to break your problem down to get this to work. Are you using 32 or 64 bit java?

For data of that size, you should not plan to store it all in memory. The most common scheme to externalize this kind of data is to store it all in a database and structure your program around database queries.

Just for right now, pretend there are a billion entries with 3 words associated with each.

If you have one billion entries you need 1 billion times the size of each entry. If you mean 3 x int as words that's 12 GB at least just for the data. If you meant words as String, you would enumerate the words as there is only about 100K words in English and it would take the same amount of space.

Given 16 GB cost a few hundred dollars, I would suggest buying more memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM