简体   繁体   English

StanfordCoreNLP-设置pipelineLanguage对德语不起作用?

[英]StanfordCoreNLP - Setting pipelineLanguage to German not working?

I am using the pycorenlp client in order to talk to the Stanford CoreNLP Server. 我正在使用pycorenlp客户端,以便与Stanford CoreNLP服务器对话。 In my setup I am setting pipelineLanguage to german like this: 在我的设置中,我将pipelineLanguage设置为german如下所示:

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

text = 'Das große Auto.'

output = nlp.annotate(text, properties={
  'annotators': 'tokenize,ssplit,pos,depparse,parse',
  'outputFormat': 'json',
  'pipelineLanguage': 'german'
  })

However, from the looks I'd say that it's not working: 但是,从外观上我会说它不起作用:

output['sentences'][0]['tokens']

will return: 将返回:

[{'after': ' ',
  'before': '',
  'characterOffsetBegin': 0,
  'characterOffsetEnd': 3,
  'index': 1,
  'originalText': 'Das',
  'pos': 'NN',
  'word': 'Das'},
 {'after': ' ',
  'before': ' ',
  'characterOffsetBegin': 4,
  'characterOffsetEnd': 9,
  'index': 2,
  'originalText': 'große',
  'pos': 'NN',
  'word': 'große'},
 {'after': '',
  'before': ' ',
  'characterOffsetBegin': 10,
  'characterOffsetEnd': 14,
  'index': 3,
  'originalText': 'Auto',
  'pos': 'NN',
  'word': 'Auto'},
 {'after': '',
  'before': '',
  'characterOffsetBegin': 14,
  'characterOffsetEnd': 15,
  'index': 4,
  'originalText': '.',
  'pos': '.',
  'word': '.'}]

This should be more like 这应该更像

     Das  große  Auto
POS:  DT     JJ    NN

It seems to me that setting 'pipelineLanguage': 'de' does not work for some reason. 在我看来,出于某些原因,设置'pipelineLanguage': 'de'无效。

I've executed 我已经执行了

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

in order to start the server. 为了启动服务器。


I am getting the following from the logger: 我从记录器中得到以下信息:

[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-3] ERROR CoreNLP - Failure to load language specific properties: StanfordCoreNLP-german.properties for german
[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:60700] API call w/annotators tokenize,ssplit,pos,depparse,parse
Das große Auto.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.645 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.8 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec].

Apparently the server is loading the models for the English language - without warning me about that. 显然,服务器正在加载英语的模型-对此没有警告我。

Alright, I just downloaded the models jar for German from the website and moved it into the directory where I extracted the server eg 好了,我刚刚从网站上下载了德语的模型罐,并将其移到了我提取服务器的目录中,例如

~/Downloads/stanford-corenlp-full-2017-06-09 $

After re-running the server, the model was successfully loaded. 重新运行服务器后,已成功加载模型。

[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/german/german-hgc.tagger ... done [5.1 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/UD_German.gz ... 
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99984, Elapsed Time: 11.419 (s)
[pool-1-thread-3] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [12.2 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-3] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [1.0 sec].
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-3] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz ... done [0.7 sec].

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM