简体   繁体   English

如何一次初始化stanfordNLP管道并使用多次而无需再次初始化?

[英]How to initialize stanfordNLP pipeline once and use many times without initializing again?

I want to initialize stanfordNLP pipelince once and use it many times without initializing it again, to improve the execution time. 我想一次初始化stanfordNLP pipelince,并多次使用它,而无需再次对其进行初始化,以缩短执行时间。

Is it possible? 可能吗?

I have code: 我有代码:

    public static boolean isHeaderMatched(String string) {

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

    Annotation document = new Annotation(string);

    // use the pipeline to annotate the document we created
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

    boolean flag = false;
    for (CoreMap sentence : sentences) {
        List<MatchedExpression> matched = extractor.extractExpressions(sentence);
        //System.out.println("Probable Header is : " + matched);
        Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
        System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

        //checked if the more than half the no. of word in header(string) is matched
        if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                //System.out.println("This is sure a header!");
            flag = true;
        } else {
            flag = false;
        }
  /*for(MatchedExpression phrase: matched){
    System.out.println("matched header type: " + phrase.getValue().get());
  }*/
    }
    return flag;
}

I want to execute this part of code to be executed only at first call of above method to load the model. 我想执行这部分代码,仅在上述方法的第一次调用时执行,以加载模型。

    // creates a StanfordCoreNLP object.
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

    RedwoodConfiguration.current().clear().apply();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);

Thanks in advance. 提前致谢。

The following is an example of what you can do: 以下是您可以做什么的示例:

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    static {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        Annotation document = new Annotation(string);

        // use the pipeline to annotate the document we created
        pipeline.annotate(document);
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);

        CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");

        boolean flag = false;
        for (CoreMap sentence : sentences) {
            List<MatchedExpression> matched = extractor.extractExpressions(sentence);
            //System.out.println("Probable Header is : " + matched);
            Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
            System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

            // checked if the more than half the no. of word in header(string) is matched
            if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
                flag = true;
            } else {
                flag = false;
            }

        }

        return flag;
    }

}

In the above code the static block will be executed when the class is loaded. 在上面的代码中,将在加载类时执行static块。 If you do not wish for this behavior then allow access to an init method, like the following: 如果您不希望出现这种情况,请允许访问init方法,如下所示:

public class Example {
    private static StanfordCoreNLP pipeline;
    private static Env env;

    public static init() {
        // creates a StanfordCoreNLP object.
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner");

        RedwoodConfiguration.current().clear().apply();
        pipeline = new StanfordCoreNLP(props);

        env = TokenSequencePattern.getNewEnv();
        env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
        env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    }

    public static boolean isHeaderMatched(String string) {
        // code left out for brevity
    }

}

Which can be called from another class using: 可以使用以下命令从另一个类调用它:

Example.init();
Example.isHeaderMatched("foobar");

While writing this answer I noticed a possible flaw in your logic. 在编写此答案时,我注意到您的逻辑中可能存在缺陷。 The following code may not produce the behavior you desire. 下面的代码可能不会产生您想要的行为。

boolean flag = false;
for (CoreMap sentence : sentences) {
    List<MatchedExpression> matched = extractor.extractExpressions(sentence);
    //System.out.println("Probable Header is : " + matched);
    Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
    System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());

    // checked if the more than half the no. of word in header(string) is matched
    if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
        flag = true;
    } else {
        flag = false;
    }

}

You're iterating over every CoreMap in the List<CoreMap> collection sentences . 您正在遍历List<CoreMap>集合sentences中的每个CoreMap Every iteration you set flag to the result of the conditional, this is where the problem lies. 您将flag设置为条件结果的每次迭代,这就是问题所在。 The boolean flag will only reflect the result of the last sentence run through the conditional. 布尔flag将仅反映条件sentence中最后一个sentence的结果。 If you need to know the result for each sentence then you should have a list of booleans to keep track of the results, otherwise remove the loop and just check the last sentence (because that's what your loop is doing anyways). 如果您需要知道每个sentence的结果,那么应该有一个布尔值列表来跟踪结果,否则删除循环并仅检查最后一个句子(因为无论如何这就是循环的作用)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pdfbox:如何加载一次字体并多次使用? - pdfbox: how to load a font once and use it many times? SpringJUnit4ClassRunner应该初始化上下文多少次? - How many times should SpringJUnit4ClassRunner initialize it's context? 如何在spring容器中一次初始化bean并在各处使用它 - how to initialize a bean in spring container once and use it everywhere 如何在WebDriver类中初始化一次驱动程序,然后使用它来启动其他类? - How to initialize driver once in WebDriver class and then use it to start other classes? 如何在 Spring Boot 中初始化一次 MongoClient 并使用它的方法? - How to initialize MongoClient once in Spring Boot and use its methods? 如何在不两次初始化布尔值的情况下使用while循环条件? - How to use a while loop condition without initializing a boolean twice? 如何用较少的代码行初始化和使用许多类似的Jbutton? - How to initialize and use many similar Jbuttons with less lines of code? 如何在StanfordNLP中修改TokenRegex规则? - How to modify TokenRegex rule in StanfordNLP? 应返回字符串中连续重复 3 次的字母数,不使用正则表达式...仅使用核心概念 - Should return how many letters repeated consecutively 3 times in a string,without using regex...use only core concepts ScheduleAtFixedRate执行一次(或等于corePoolSize相等的次数) - ScheduleAtFixedRate executes once (or as many times as corePoolSize equals)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM