简体   繁体   中英

Spacy NER Model Training Data Improvement

Am newer to NLP, Try to create NER model with help of spacy.io , I just Create my own NER Model for ORG entity https://spacy.io/usage/training#ner . Trained Data size was 100 and Trained data look like this.,

TRAIN_DATA = [
    ("2003 -2005 Pergo Inc. Software Analyst\Database Administrator", {"entities": [(11, 20, "ORG")]}),
    ("PROFESSIONAL EXPERIENCE Client: WPS Health Solutions, Madison, WI                           Mar17 - Till Date Role: RPA Developer", {"entities": [(32, 52, "ORG")]}),
    ("Client: National Institutes of Health (NIH/NIAMS), Bethesda, MD             Jan15 - Feb17 Role: RPA Developer", {"entities": [(8, 36, "ORG")]}),
    ("Client: Wells Fargo, Fremont, CA                                                   July14 - Dec14 Role: .Net/SharePoint Developer", {"entities": [(8, 19, "ORG")]}),
]

Now I Test my sentence with my Trained Model. If am used trained data I got perfect company name.

doc = nlp('Client: Ananth Technologies Limited, Hyderabad, India Feb11- July12 Role: QA Automation Tester')
print("Organization", [(ent.text, ent.label_) for ent in doc.ents])

Organization [(u'Ananth Technologies Limited', u'ORG')]

but I passed new sentence it partially detect.

doc = nlp('Client: MOUNTAIN HIGH HOME BUILDERS, Loveland, CO Application Engineer 8/03-5/10')
print("Organization", [(ent.text, ent.label_) for ent in doc.ents])

Organization [(u'MOUNTAIN HIGH', u'ORG')]

Now I gradually increase my Trained data, accuracy increased at same time predict wrong word as ORG. My trained data(sentence) is look different with each like Date,Designation,location,etc..., in different places not in order you can see above(TRAIN_DATA). Now am Struck with here and My question is am in right way?

Can anyone please suggest me any idea to improve my model?

Thanks

You need a way bigger dataset for training for the model to predict better. 100 datasets will fail in different cases most of the time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM