简体   繁体   中英

Reduce Java Heap Size

I am using Java8 with Apache OpenNLP . I have a service that extracts all the nouns from a paragraph. This works as expected on my localhost server. I also had this running on an OpenShift server with no problems. However, it does use a lot of memory. I need to deploy my application to AWS Elastic Beanstalk Tomcat Server .

One solution is I could probably upgrade from AWS Elastic Beanstalk Tomcat Server t1.micro to another instance type. But I am on a small budget, and want to avoid the extra fees if possible.

When I run the app, and it tries to do the word chunking, it gets the following error:

 dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space] with root cause java.lang.OutOfMemoryError: Java heap space at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:148) at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:75) at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:59) at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:87) at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:35) at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:31) at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:328) at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:256) at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:179) at opennlp.tools.parser.ParserModel.<init>(ParserModel.java:180) at com.jobs.spring.service.lang.LanguageChunkerServiceImpl.init(LanguageChunkerServiceImpl.java:35) at com.jobs.spring.service.lang.LanguageChunkerServiceImpl.getNouns(LanguageChunkerServiceImpl.java:46) 

Question

Is there a way to either:

  1. Reduce the amount of memory used when extracting the nouns from a paragraph.

  2. Use a different api other than Apache OpenNLP that won't use as much memory.

  3. A way to configure AWS Elastic Beanstalk Tomcat Server to cope with the demands.

Code Sample:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashSet;
import java.util.Set;

import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;

import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
import opennlp.tools.util.InvalidFormatException;

@Component("languageChunkerService")
@Transactional
public class LanguageChunkerServiceImpl implements LanguageChunkerService {

    private Set<String> nouns = null;
    private InputStream modelInParse = null;
    private ParserModel model = null;
    private Parser parser = null;

    public void init() throws InvalidFormatException, IOException {
        ClassLoader classLoader = getClass().getClassLoader();
        File file = new File(classLoader.getResource("en-parser-chunking.bin").getFile());
        modelInParse = new FileInputStream(file.getAbsolutePath());

        // load chunking model
        model = new ParserModel(modelInParse); // line 35
        // create parse tree
        parser = ParserFactory.create(model);
    }

    @Override
    public Set<String> getNouns(String sentenceToExtract) {
        Set<String> extractedNouns = new HashSet<String>();
        nouns = new HashSet<>();
        try {
            if (parser == null) {
                init();
            }

            Parse topParses[] = ParserTool.parseLine(sentenceToExtract, parser, 1);

            // call subroutine to extract noun phrases
            for (Parse p : topParses) {
                getNounPhrases(p);
            }

            // print noun phrases
            for (String s : nouns) {
                String word = s.replaceAll("[^a-zA-Z ]", "").toLowerCase();// .split("\\s+");
                //System.out.println(word);
                extractedNouns.add(word);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (modelInParse != null) {
                try {
                    modelInParse.close();
                } catch (IOException e) {
                }
            }
        }
        return extractedNouns;
    }

    // recursively loop through tree, extracting noun phrases
    private void getNounPhrases(Parse p) {
        if (p.getType().equals("NN")) { // NP=noun phrase
            // System.out.println(p.getCoveredText()+" "+p.getType());
            nouns.add(p.getCoveredText());
        }
        for (Parse child : p.getChildren())
            getNounPhrases(child);
    }
}

UPDATE

Tomcat8 config:

在此处输入图片说明

First of all you should try to optimize your code. This starts by using regex and precompile statements before using replaceAll since replaceAll replaced the stuff in the memory. ( https://eyalsch.wordpress.com/2009/05/21/regex/ )

Second you should not store the parsed sentence in an array. Third hint: you should try to allocate the memory to your array using bytebuffer. Another Hint which may affect you at the most, you should use a BufferedReader to read your chunked file. ( out of memory error, java heap space )

After this you should already see less memory usage. If those tips didnt help you please provide a memory dump/allocation graph.

One more tip: A HashSet takes 5.5x more memory then an unordered List. ( Performance and Memory allocation comparision between List and Set )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM