Avoiding at most hard-coded rules for specific patterns.
I'm currently working on a similar project as AWS Textract, link here . I've been successful at extracting data from files, but in an unstructured way. Now, i'm trying to figure out, and in the best ways, how to get existing Key-Value Pairs from that bunch of information.
For example we have a text like that :
In this document we will find different key and values like this id : 1 and that country : France with no specific punctuation and probably talking about how good is my health...
The extraction would be something like that :
id : 1
country : France
health : good
What i actually know is that Amazon use a "confidence" variable into extracting information from that kind of scenario, which i guess involve some machine-learning algorithm. In my case, i don't have that big of a database to learn from.
I'm pretty sure that there is an easier solution neither less flexible.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.