I am trying to use the coreNLP python wrapper in google colab to do syntax analysis with Chinese. I can use the english model without issue but I am having trouble using the Chinese model.
My code:
corenlp_dir = './corenlp'
stanza.install_corenlp(dir=corenlp_dir)
# Set the CORENLP_HOME environment variable to point to the installation location
import os
os.environ["CORENLP_HOME"] = corenlp_dir
stanza.download_corenlp_models(model='chinese', version='4.2.2', dir=corenlp_dir)
# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001
client = CoreNLPClient(
annotators=['tokenize','ssplit','pos','parse'],
memory='4G',
output_format="json",
endpoint='http://localhost:9001',
properties = {
# segment
"tokenize.language": "zh",
"segment.model": "edu/stanford/nlp/models/segmenter/chinese/ctb.gz",
"segment.sighanCorporaDict": "edu/stanford/nlp/models/segmenter/chinese",
"segment.serDictionary": "edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz",
"segment.sighanPostProcessing": "true",
# sentence split
"ssplit.boundaryTokenRegex": "[.。]|[!?!?]+",
# pos
"pos.model": "edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger",
# ner
"ner.language": "chinese",
"ner.model": "edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz",
"ner.applyNumericClassifiers": "true",
"ner.useSUTime": "false",
# regexner
"ner.fine.regexner.mapping": "edu/stanford/nlp/models/kbp/chinese/gazetteers/cn_regexner_mapping.tab",
"ner.fine.regexner.noDefaultOverwriteLabels": "CITY,COUNTRY,STATE_OR_PROVINCE"
},
be_quiet=True)
print(client)
# Start the background server and wait for some time
# Note that in practice this is totally optional, as by default the server will be started when the first annotation is performed
client.start()
import time; time.sleep(10)
# Annotate some text
text = "狗跑到树上"
document = client.annotate(text)
But I get this error: AnnotationException: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
In the latest versions, including 4.2.2 (why are you not upgrading to 4.4.0?), the tagger is now in this location:
edu/stanford/nlp/models/pos-tagger/chinese-distsim.tagger
Where did you get the example you are using? Perhaps some piece of documentation needs to be updated.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.