从非结构化字符串中提取键值对的最佳方法？

Question

Avoiding at most hard-coded rules for specific patterns. 避免特定模式的大多数硬编码规则。

I'm currently working on a similar project as AWS Textract, link here . 我目前正在开发与AWS Textract类似的项目，请点击此处链接。 I've been successful at extracting data from files, but in an unstructured way. 我已经成功地从文件中提取数据，但是以非结构化的方式。 Now, i'm trying to figure out, and in the best ways, how to get existing Key-Value Pairs from that bunch of information. 现在，我试图弄清楚如何从这一堆信息中获取现有的Key-Value Pairs，并以最佳方式。

For example we have a text like that : 例如，我们有这样的文字：

In this document we will find different key and values like this id : 1 and that country : France with no specific punctuation and probably talking about how good is my health... 在本文档中，我们将找到不同的键和值，如id：1和那个国家：法国没有特定的标点符号，可能还在谈论我的健康状况有多好......

The extraction would be something like that : 提取将是这样的：

id : 1
country : France
health : good

What i actually know is that Amazon use a "confidence" variable into extracting information from that kind of scenario, which i guess involve some machine-learning algorithm. 我真正知道的是，亚马逊使用“置信度”变量从这种场景中提取信息，我猜这涉及一些机器学习算法。 In my case, i don't have that big of a database to learn from. 就我而言，我没有那么大的数据库可供学习。

I'm pretty sure that there is an easier solution neither less flexible. 我很确定有一个更简单的解决方案，既不灵活。

Answer 1

I believe that spaCy library may be the the right tool for your needs. 我相信spaCy库可能是满足您需求的正确工具。 Check out the description on GitHub to figure it out. 查看GitHub上的描述来弄清楚。

It can be exposed to Node JS using spacy-nlp package. 它可以使用spacy-nlp包暴露给Node JS。

从非结构化字符串中提取键值对的最佳方法？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-12-09 11:00:33

从非结构化字符串中提取键值对的最佳方法？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-12-09 11:00:33

解决方案1
2 已采纳 2018-12-09 11:00:33