简体   繁体   中英

Difference spacy's “--base-model” and “--vectors” arguments for using custom embeddings for NER?

I trained fasttext embeddings and saved them as a .vec file. I want to use these for my spacy NER model. Is there a difference between

python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --base-model embeddings.vec

and

python -m spacy train en [new_model] [train_data] [dev_data] --pipeline ner --vectors embeddings.vec ?

Both methods produce nearly identical training loss, F score, etc.

If you need to initialize a spacy model with vectors, use spacy init-model like this where lg is the language code:

spacy init-model lg model_dir -v embeddings.vec -vn my_custom_vectors

Once you have the vectors saved as part of a spacy model:

  • --vectors loads the vectors from the provided model, so the initial model is spacy.blank("lg") + vectors
  • --base-model loads everything (tokenizer, pipeline components, vectors) from the provided model, so the initial model is spacy.load(model)

If the provided model doesn't have any pipeline components in it, the only potential difference is the tokenizer settings resulting from spacy.blank("lg") which can vary a little between individual spacy versions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM