简体   繁体   中英

What are the ways of Key-Value extraction from unstructured text?

I'm trying to figure out what are the ways (and which of them the best one) of extraction of Values for predefined Keys in the unstructured text?

Input:

  1. The doctor prescribed me a drug called favipiravir.
  2. His name is Yury.
  3. Ilya has already told me about that.
  4. The weather is cold today.
  5. I am taking a medicine called nazivin.

Key list: ['drug', 'name', 'weather']

Output:

['drug=favipiravir', 'drug=nazivin', 'name=Yury', 'weather=cold']

So, as you can see, in the 3d sentence there is no explicit key 'name' and therefore no value extracted (I think there is the difference with NER). At the same time, 'drug' and 'medicine' are synonyms and we should treat 'medicine' as 'drug' key and extract the value also.

And the next question, what if the key set will be mutable? Should I use as a base regexp approach because of predefined Keys or there is a way to implement it with supervised learning/NN? (but in this case how to deal with mutable keys?)

You can use a parser to tag wards. Your problem is similar to Named Entity Recognition. A lot of libraries have POS taggers available. You can try those. They are generally trained to identify names, locations, etc. Depending on the type of words you need, you may need to train the parser. So you'll need some labeled data also.
Check out this link: https://nlp.stanford.edu/software/CRF-NER.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM