简体   繁体   中英

How to initialize stanfordNLP pipeline once and use many times without initializing again?

I want to initialize stanfordNLP pipelince once and use it many times without initializing it again, to improve the execution time.

Is it possible?

I have code:

    public static boolean isHeaderMatched(String string) {

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

    Annotation document = new Annotation(string);

    // use the pipeline to annotate the document we created
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

    boolean flag = false;
    for (CoreMap sentence : sentences) {
        List<MatchedExpression> matched = extractor.extractExpressions(sentence);
        //System.out.println("Probable Header is : " + matched);
        Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
        System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

        //checked if the more than half the no. of word in header(string) is matched
        if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                //System.out.println("This is sure a header!");
            flag = true;
        } else {
            flag = false;
        }
  /*for(MatchedExpression phrase: matched){
    System.out.println("matched header type: " + phrase.getValue().get());
  }*/
    }
    return flag;
}

I want to execute this part of code to be executed only at first call of above method to load the model.

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

Thanks in advance.

The following is an example of what you can do:

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    static {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        Annotation document = new Annotation(string);

        // use the pipeline to annotate the document we created
        pipeline.annotate(document);
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

        boolean flag = false;
        for (CoreMap sentence : sentences) {
            List<MatchedExpression> matched = extractor.extractExpressions(sentence);
            //System.out.println("Probable Header is : " + matched);
            Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
            System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

            // checked if the more than half the no. of word in header(string) is matched
            if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                flag = true;
            } else {
                flag = false;
            }

        }

        return flag;
    }

}

In the above code the static block will be executed when the class is loaded. If you do not wish for this behavior then allow access to an init method, like the following:

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    public static init() {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        // code left out for brevity
    }

}

Which can be called from another class using:

Example.init();
Example.isHeaderMatched("foobar");

While writing this answer I noticed a possible flaw in your logic. The following code may not produce the behavior you desire.

boolean flag = false;
for (CoreMap sentence : sentences) {
    List<MatchedExpression> matched = extractor.extractExpressions(sentence);
    //System.out.println("Probable Header is : " + matched);
    Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
    System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

    // checked if the more than half the no. of word in header(string) is matched
    if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
        flag = true;
    } else {
        flag = false;
    }

}

You're iterating over every CoreMap in the List<CoreMap> collection sentences . Every iteration you set flag to the result of the conditional, this is where the problem lies. The boolean flag will only reflect the result of the last sentence run through the conditional. If you need to know the result for each sentence then you should have a list of booleans to keep track of the results, otherwise remove the loop and just check the last sentence (because that's what your loop is doing anyways).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM