如何一次初始化stanfordNLP管道並使用多次而無需再次初始化？

Question

我想一次初始化stanfordNLP pipelince，並多次使用它，而無需再次對其進行初始化，以縮短執行時間。

可能嗎？

我有代碼：

    public static boolean isHeaderMatched(String string) {

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

    Annotation document = new Annotation(string);

    // use the pipeline to annotate the document we created
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

    boolean flag = false;
    for (CoreMap sentence : sentences) {
        List<MatchedExpression> matched = extractor.extractExpressions(sentence);
        //System.out.println("Probable Header is : " + matched);
        Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
        System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

        //checked if the more than half the no. of word in header(string) is matched
        if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                //System.out.println("This is sure a header!");
            flag = true;
        } else {
            flag = false;
        }
  /*for(MatchedExpression phrase: matched){
    System.out.println("matched header type: " + phrase.getValue().get());
  }*/
    }
    return flag;
}

我想執行這部分代碼，僅在上述方法的第一次調用時執行，以加載模型。

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

提前致謝。

Answer 1

以下是您可以做什么的示例：

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    static {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        Annotation document = new Annotation(string);

        // use the pipeline to annotate the document we created
        pipeline.annotate(document);
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

        boolean flag = false;
        for (CoreMap sentence : sentences) {
            List<MatchedExpression> matched = extractor.extractExpressions(sentence);
            //System.out.println("Probable Header is : " + matched);
            Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
            System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

            // checked if the more than half the no. of word in header(string) is matched
            if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                flag = true;
            } else {
                flag = false;
            }

        }

        return flag;
    }

}

在上面的代碼中，將在加載類時執行static塊。 如果您不希望出現這種情況，請允許訪問init方法，如下所示：

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    public static init() {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        // code left out for brevity
    }

}

可以使用以下命令從另一個類調用它：

Example.init();
Example.isHeaderMatched("foobar");

在編寫此答案時，我注意到您的邏輯中可能存在缺陷。 下面的代碼可能不會產生您想要的行為。

boolean flag = false;
for (CoreMap sentence : sentences) {
    List<MatchedExpression> matched = extractor.extractExpressions(sentence);
    //System.out.println("Probable Header is : " + matched);
    Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
    System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

    // checked if the more than half the no. of word in header(string) is matched
    if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
        flag = true;
    } else {
        flag = false;
    }

}

您正在遍歷List<CoreMap>集合sentences中的每個CoreMap 。 您將flag設置為條件結果的每次迭代，這就是問題所在。 布爾flag將僅反映條件sentence中最后一個sentence的結果。 如果您需要知道每個sentence的結果，那么應該有一個布爾值列表來跟蹤結果，否則刪除循環並僅檢查最后一個句子（因為無論如何這就是循環的作用）。

如何一次初始化stanfordNLP管道並使用多次而無需再次初始化？

問題描述

1 個解決方案

解決方案1
2 已采納 2017-06-05 17:22:52

如何一次初始化stanfordNLP管道並使用多次而無需再次初始化？

問題描述

1 個解決方案

解決方案1 2 已采納 2017-06-05 17:22:52

解決方案1
2 已采納 2017-06-05 17:22:52