简体   繁体   English

从非结构化字符串中提取键值对的最佳方法?

[英]Best way to extract Key-Value Pairs from unstructured String?

Avoiding at most hard-coded rules for specific patterns. 避免特定模式的大多数硬编码规则。

I'm currently working on a similar project as AWS Textract, link here . 我目前正在开发与AWS Textract类似的项目,请点击此处链接 I've been successful at extracting data from files, but in an unstructured way. 我已经成功地从文件中提取数据,但是以非结构化的方式。 Now, i'm trying to figure out, and in the best ways, how to get existing Key-Value Pairs from that bunch of information. 现在,我试图弄清楚如何从这一堆信息中获取现有的Key-Value Pairs,并以最佳方式。

For example we have a text like that : 例如,我们有这样的文字:

In this document we will find different key and values like this id : 1 and that country : France with no specific punctuation and probably talking about how good is my health... 在本文档中,我们将找到不同的键和值,如id:1和那个国家:法国没有特定的标点符号,可能还在谈论我的健康状况有多好......

The extraction would be something like that : 提取将是这样的:

id : 1
country : France
health : good

What i actually know is that Amazon use a "confidence" variable into extracting information from that kind of scenario, which i guess involve some machine-learning algorithm. 我真正知道的是,亚马逊使用“置信度”变量从这种场景中提取信息,我猜这涉及一些机器学习算法。 In my case, i don't have that big of a database to learn from. 就我而言,我没有那么大的数据库可供学习。

I'm pretty sure that there is an easier solution neither less flexible. 我很确定有一个更简单的解决方案,既不灵活。

I believe that spaCy library may be the the right tool for your needs. 我相信spaCy库可能是满足您需求的正确工具。 Check out the description on GitHub to figure it out. 查看GitHub上的描述来弄清楚。

It can be exposed to Node JS using spacy-nlp package. 它可以使用spacy-nlp包暴露给Node JS。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从html标记格式的字符串中提取键/值对 - Extract key-value pairs from html tag formatted String 如何在特定的字符串过滤器上从 object 中提取键值对? - How to extract key-value pairs from a object, on a particular string filter? 通过迭代具有不同键值对的单个 object 数组来制作 flatList 的最佳方法 - the best way to make a flatList by iterating over a single object array with different key-value pairs 如何从客户端JS文件中的数组中的多个对象中提取所需的键值对? - How to Extract Desired Key-Value pairs from Multiple Objects in Array in JS File on Client-Side? 有没有办法在GraphQL中表示键值对的对象 - Is there a way to represent an object of key-value pairs in GraphQL 如何根据正则表达式从带有括号括起来的值的字符串中检索键值对? - How to retrieve key-value pairs from a string with parentheses enclosed values based on a regular expression? 如何使用 RegExp 从 (key, value) 对字符串中提取值? - How to extract values from string of (key, value) pairs using RegExp? 如何用typeof string过滤掉键值对 - How to filter out key-value pairs with typeof string 递归从对象中删除空键/值对的最佳方法 - Best way to remove empty key/value pairs from objects recursively 在java脚本中将字符串转换为对象中的键值对 - Convert string into key-value pairs in an object in java script
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM