[英]How to initialize stanfordNLP pipeline once and use many times without initializing again?
I want to initialize stanfordNLP pipelince once and use it many times without initializing it again, to improve the execution time. 我想一次初始化stanfordNLP pipelince,并多次使用它,而无需再次对其进行初始化,以缩短执行时间。
Is it possible? 可能吗?
I have code: 我有代码:
public static boolean isHeaderMatched(String string) {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
Annotation document = new Annotation(string);
// use the pipeline to annotate the document we created
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
//checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
//System.out.println("This is sure a header!");
flag = true;
} else {
flag = false;
}
/*for(MatchedExpression phrase: matched){
System.out.println("matched header type: " + phrase.getValue().get());
}*/
}
return flag;
}
I want to execute this part of code to be executed only at first call of above method to load the model. 我想执行这部分代码,仅在上述方法的第一次调用时执行,以加载模型。
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
Thanks in advance. 提前致谢。
The following is an example of what you can do: 以下是您可以做什么的示例:
public class Example {
private static StanfordCoreNLP pipeline;
private static Env env;
static {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
pipeline = new StanfordCoreNLP(props);
env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
}
public static boolean isHeaderMatched(String string) {
Annotation document = new Annotation(string);
// use the pipeline to annotate the document we created
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
// checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
flag = true;
} else {
flag = false;
}
}
return flag;
}
}
In the above code the static
block will be executed when the class is loaded. 在上面的代码中,将在加载类时执行
static
块。 If you do not wish for this behavior then allow access to an init
method, like the following: 如果您不希望出现这种情况,请允许访问
init
方法,如下所示:
public class Example {
private static StanfordCoreNLP pipeline;
private static Env env;
public static init() {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
pipeline = new StanfordCoreNLP(props);
env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
}
public static boolean isHeaderMatched(String string) {
// code left out for brevity
}
}
Which can be called from another class using: 可以使用以下命令从另一个类调用它:
Example.init();
Example.isHeaderMatched("foobar");
While writing this answer I noticed a possible flaw in your logic. 在编写此答案时,我注意到您的逻辑中可能存在缺陷。 The following code may not produce the behavior you desire.
下面的代码可能不会产生您想要的行为。
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
// checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
flag = true;
} else {
flag = false;
}
}
You're iterating over every CoreMap
in the List<CoreMap>
collection sentences
. 您正在遍历
List<CoreMap>
集合sentences
中的每个CoreMap
。 Every iteration you set flag
to the result of the conditional, this is where the problem lies. 您将
flag
设置为条件结果的每次迭代,这就是问题所在。 The boolean flag
will only reflect the result of the last sentence
run through the conditional. 布尔
flag
将仅反映条件sentence
中最后一个sentence
的结果。 If you need to know the result for each sentence
then you should have a list of booleans to keep track of the results, otherwise remove the loop and just check the last sentence (because that's what your loop is doing anyways). 如果您需要知道每个
sentence
的结果,那么应该有一个布尔值列表来跟踪结果,否则删除循环并仅检查最后一个句子(因为无论如何这就是循环的作用)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.