简体   繁体   中英

Best way to extract Key-Value Pairs from unstructured String?

Avoiding at most hard-coded rules for specific patterns.

I'm currently working on a similar project as AWS Textract, link here . I've been successful at extracting data from files, but in an unstructured way. Now, i'm trying to figure out, and in the best ways, how to get existing Key-Value Pairs from that bunch of information.

For example we have a text like that :

In this document we will find different key and values like this id : 1 and that country : France with no specific punctuation and probably talking about how good is my health...

The extraction would be something like that :

id : 1
country : France
health : good

What i actually know is that Amazon use a "confidence" variable into extracting information from that kind of scenario, which i guess involve some machine-learning algorithm. In my case, i don't have that big of a database to learn from.

I'm pretty sure that there is an easier solution neither less flexible.

I believe that spaCy library may be the the right tool for your needs. Check out the description on GitHub to figure it out.

It can be exposed to Node JS using spacy-nlp package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM