[英]How to initialize stanfordNLP pipeline once and use many times without initializing again?
我想一次初始化stanfordNLP pipelince,並多次使用它,而無需再次對其進行初始化,以縮短執行時間。
可能嗎?
我有代碼:
public static boolean isHeaderMatched(String string) {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
Annotation document = new Annotation(string);
// use the pipeline to annotate the document we created
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
//checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
//System.out.println("This is sure a header!");
flag = true;
} else {
flag = false;
}
/*for(MatchedExpression phrase: matched){
System.out.println("matched header type: " + phrase.getValue().get());
}*/
}
return flag;
}
我想執行這部分代碼,僅在上述方法的第一次調用時執行,以加載模型。
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
提前致謝。
以下是您可以做什么的示例:
public class Example {
private static StanfordCoreNLP pipeline;
private static Env env;
static {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
pipeline = new StanfordCoreNLP(props);
env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
}
public static boolean isHeaderMatched(String string) {
Annotation document = new Annotation(string);
// use the pipeline to annotate the document we created
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule");
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
// checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
flag = true;
} else {
flag = false;
}
}
return flag;
}
}
在上面的代碼中,將在加載類時執行static
塊。 如果您不希望出現這種情況,請允許訪問init
方法,如下所示:
public class Example {
private static StanfordCoreNLP pipeline;
private static Env env;
public static init() {
// creates a StanfordCoreNLP object.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
RedwoodConfiguration.current().clear().apply();
pipeline = new StanfordCoreNLP(props);
env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
}
public static boolean isHeaderMatched(String string) {
// code left out for brevity
}
}
可以使用以下命令從另一個類調用它:
Example.init();
Example.isHeaderMatched("foobar");
在編寫此答案時,我注意到您的邏輯中可能存在缺陷。 下面的代碼可能不會產生您想要的行為。
boolean flag = false;
for (CoreMap sentence : sentences) {
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
//System.out.println("Probable Header is : " + matched);
Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched);
System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size());
// checked if the more than half the no. of word in header(string) is matched
if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) {
flag = true;
} else {
flag = false;
}
}
您正在遍歷List<CoreMap>
集合sentences
中的每個CoreMap
。 您將flag
設置為條件結果的每次迭代,這就是問題所在。 布爾flag
將僅反映條件sentence
中最后一個sentence
的結果。 如果您需要知道每個sentence
的結果,那么應該有一個布爾值列表來跟蹤結果,否則刪除循環並僅檢查最后一個句子(因為無論如何這就是循環的作用)。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.