简体   繁体   中英

How can I use the chinese model with the stanza coreNLP client in google colab?

I am trying to use the coreNLP python wrapper in google colab to do syntax analysis with Chinese. I can use the english model without issue but I am having trouble using the Chinese model.

My code:

corenlp_dir = './corenlp'
stanza.install_corenlp(dir=corenlp_dir)

# Set the CORENLP_HOME environment variable to point to the installation location
import os
os.environ["CORENLP_HOME"] = corenlp_dir

stanza.download_corenlp_models(model='chinese', version='4.2.2', dir=corenlp_dir)
# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001
client = CoreNLPClient(
    annotators=['tokenize','ssplit','pos','parse'], 
    memory='4G', 
    output_format="json",
    endpoint='http://localhost:9001',
    properties = {
            # segment
            "tokenize.language": "zh",
            "segment.model": "edu/stanford/nlp/models/segmenter/chinese/ctb.gz",
            "segment.sighanCorporaDict": "edu/stanford/nlp/models/segmenter/chinese",
            "segment.serDictionary": "edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz",
            "segment.sighanPostProcessing": "true",
            # sentence split
            "ssplit.boundaryTokenRegex": "[.。]|[!?!?]+",
            # pos
            "pos.model": "edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger",
            # ner
            "ner.language": "chinese",
            "ner.model": "edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz",
            "ner.applyNumericClassifiers": "true",
            "ner.useSUTime": "false",
            # regexner
            "ner.fine.regexner.mapping": "edu/stanford/nlp/models/kbp/chinese/gazetteers/cn_regexner_mapping.tab",
            "ner.fine.regexner.noDefaultOverwriteLabels": "CITY,COUNTRY,STATE_OR_PROVINCE"
        },
    be_quiet=True)
print(client)

# Start the background server and wait for some time
# Note that in practice this is totally optional, as by default the server will be started when the first annotation is performed
client.start()
import time; time.sleep(10)


# Annotate some text
text = "狗跑到树上"
document = client.annotate(text)
    

But I get this error: AnnotationException: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)

In the latest versions, including 4.2.2 (why are you not upgrading to 4.4.0?), the tagger is now in this location:

edu/stanford/nlp/models/pos-tagger/chinese-distsim.tagger

Where did you get the example you are using? Perhaps some piece of documentation needs to be updated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM