简体   繁体   中英

How to train completely new entities instead of pre-trained entities using Spacy NER model?

How do I do transfer learning ie take pre-trained Spacy NER model and make it learn new entities specific to my use case?

For this, I have 100 new annotated training samples. The new retrained model should only predict the new entities and not any of the existing entities in the pre-trained spacy model. Just adding/updating new entities to existing models and ignoring the old entities during prediction doesn't make sense.

This official example describes how to add new entities to existing pre-trained entities but that's not what I want. I also have very few examples ie 100 to completely built a new NER model from scratch.

Edit: I want to identify all account numbers in an unstructured document.

Example ("I would like to change address corresponding to my account 12345. Kindly let me know how to do it. " [34, 39, 'accountnumber'])

You mention that you only want to predict the new entities, and not the old ones. There is thus no reason to start from a pre-trained NER model. The features learnt for the other entity types (that you don't want) won't be used/transfered to your new entity type anyway. So you'll just have to start training a model from scratch.

You mention that you only have a few training examples (100), so (as you mention) it will be a challenge to achieve high enough accuracy. Perhaps you could consider running a rule-based matching step first, and then manually consolidate the hits from that matching step to augment your training data more quickly.

For your use case, you are adding a new entity type so there should not be confusion with existing entity types. If you call your new entity "accountnumber", you should be able to use the training script you linked to train a model.

For the extraction phase, use the code in the documentation but just filter for the "accountnumber" in the results (ie ent.label_ field) and ignore the other existing entities.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM