简体   繁体   中英

Label custom entities in Resume (NER)

How I can perform NER for custom named entity. eg If I want to identify if particular word is skill in resume. If (Java, c++) is occurring in my text i should be able to label them as skill. I don't want to use spacy with custom corpus.I want to create the dataset eg words will be my features and label(skill) will be my dependent variable.

what is the best approach to handle these kinda problems.

The alternative to custom dictionaries and gazettes is to create a dataset where you assign to each word the corrisponding label. You can define a set of labels (eg {OTHER, SKILL}) and create a dataset with examples like:

I        OTHER
can      OTHER
program  OTHER
in       OTHER
Python   SKILL
.        OTHER 

And with a large enough dataset you train a model to predict the corresponding label.

You can try to get a list of "coding language" synonims (or the specific skills you are looking for) from word embeddings trained on your CV corpus and use this information to automatically label other corpora. I would say that key point is to find a way to at least partially automatize the labeling otherwise you won't have enough examples to train the model on your custom NER task. Use tools like https://prodi.gy/ that reduce the labeling effort.

As features you can also use word embeddings (or other typical NLP features like n-grams, POS tag, etc. depending on the model you are using)

Another option is to apply transfer learning from other NER/NLP models and finetune them on your CV labeled dataset.

I would put more effort in creating the right dataset and then test gradually more complex models selecting what best fit your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM