简体   繁体   中英

Stanford NLP, Error while loading a tagger model, while reading models from path

I am using Stanford-NLP 3.8.0 for my project in work I was reading a lot questions about my problem, no in stackoverflow and any other sites, but i still didnt find the solution, and there no any situation like my in all of the places, where i was looking for, so i create this question

In my work i need to use Stanford NLP in a web application whithout dependencies of Stanford-parse and Stanford-models, so the solution like here not for me. Why whithout two this dependencies? Cause they weigh too much. In my project i can only load an Standord-Core-Nlp dependency, and thats all.

The problem is next.

I have got two models. The first is "russian-ud-pos.tagger" from MANASLU8 project of students of ITMO Univercity, you can download it here . The second is a Stanford CRF model english.all.3class.distsim.crf.ser.gz, it is a standart model, that you can download here

So i got this two files, and i got two codes, almost identical The pom for both codes is the same: edu.stanford.nlp stanford-corenlp 3.8.0

And thats all (yes, my pom is whithout parser, models and etc, only the Stanford-Core)

1) The first code work well. Here i put two my files in src/main/resources And the code is next

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
    props.setProperty("pos.model", "russian-ud-pos.tagger");
    props.setProperty("ner.model", "english.all.3class.distsim.crf.ser.gz");
    props.setProperty("ner.useSUTime", "false");
    props.setProperty("ner.applyNumericClassifiers", "false");
    props.setProperty("sutime.includeRange", "false");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

As you can see, i dont use any paths in properties object, just the name of files

When i get start my application, it show me next stackTrace:

19:17:55.979 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
19:17:55.994 [main] INFO  e.s.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
19:17:55.994 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
19:17:55.994 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
19:17:56.790 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from russian-ud-pos.tagger ... done [0.8 sec].
19:17:56.790 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
19:17:56.790 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
19:18:00.737 [main] INFO  e.s.n.ie.AbstractSequenceClassifier - Loading classifier from english.all.3class.distsim.crf.ser.gz ... done [3.9 sec].
19:18:01.002 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from russian-ud-pos.tagger ... done [0.3 sec].

And then it work successfully

2) Here is the second code

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
    props.setProperty("pos.model", "file:D:\\russian-ud-pos.tagger");
    props.setProperty("ner.model", "file:D:\\english.all.3class.distsim.crf.ser.gz");
    props.setProperty("ner.useSUTime", "false");
    props.setProperty("ner.applyNumericClassifiers", "false");
    props.setProperty("sutime.includeRange", "false");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

As you can see, i give the way to file to properties, like ("file:D:\\english.all.3class.distsim.crf.ser.gz") but not the ("english.all.3class.distsim.crf.ser.gz"), And when i start my code, stacktrace is folowing:

19:25:16.109 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
19:25:16.109 [main] INFO  e.s.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
19:25:16.125 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
19:25:16.125 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
19:25:16.936 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from file:D:\russian-ud-pos.tagger ... done [0.8 sec].
19:25:16.936 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
19:25:16.936 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
19:25:21.257 [main] INFO  e.s.n.ie.AbstractSequenceClassifier - Loading classifier from file:D:\english.all.3class.distsim.crf.ser.gz ... done [4.2 sec].

edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)

at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
at stanfordapplication.StanfordApplication.start(StanfordApplication.java:49)
at Test1.stanfordStringReader(Test1.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.io.IOException: Unable to open "russian-ud-pos.tagger" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
... 26 more

Process finished with exit code -1

Here, as you can see, Stanford was "Loading POS tagger from file:D:\\russian-ud-pos.tagger" and "Loading classifier from file:D:\\english.all.3class.distsim.crf.ser.gz", but didnt load "Loading POS tagger from russian-ud-pos.tagger", like in the first code

I was trying to use other versions of Stanford-Core-NLP (3.9.1 and less). I was watching the code of Sanford NLP and debug it, and still i cant understand the reasons. Also, i was trying to put files in dick C, and in dick D, (i was thinking about the administrator rights ) and start IDEA with administration rigths. And put a path without "file", like this "D:\\english.all.3class.distsim.crf.ser.gz". Alse i was trying to aply some flags to properties, like ("ner.useSUTime", "false") or ("ner.applyNumericClassifiers", "false") etc.

It sounds strange, cause Stanford load first two files, (as i understand) why it cant read the last?

Maybe Stanford cant read it, or remember, or read it more then one time

Anybody, help me please, i am trying to solve this problem about a week!

Actually, i solve the problem! but it will sounds strange. The reason of problem was that Stanford slightly-slightly expense the tagger file

I am sure about it, cause i had the same code, and then i reload tagger again from site and put it instead of old file, and it start to work!

I dont know actually, but maybe Stanford really destroy files after some time of using, or change them a little. Also it was a situation, when my files where destroyed fully, yes, after some time and many uses my files have weight 0 kb. Its real. Maybe the reason in loading one file by many process (i made two and more debugs at one time)

I would like to hope the second variant) But if you know the real, let me know!

PS I used "file:///D:/" while loading files and it works

Good luck everyone!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM