简体繁体 English

Amazon Comprehend 能否从分类广告中提取和分类数据

[英]Can Amazon Comprehend extract and categorizing data from classifieds

原文 2021-04-30 15:48:16 3 1 machine-learning/ amazon-comprehend

I have a large dataset from which I would like to extract and categorize specific elements.我有一个大型数据集，我想从中提取和分类特定元素。 Below is a most common example:下面是一个最常见的例子：

I would like to know if this is possible using Amazon Comprehend or maybe there are better tools to do that.我想知道这是否可以使用 Amazon Comprehend 或者有更好的工具来做到这一点。 I am not a developer and looking to hire someone to program this for me.我不是开发人员，并且希望聘请某人为我编程。 But I would like to understand conceptually if something like this feasible before I hire someone.但我想在雇用某人之前从概念上了解这样的事情是否可行。

1 个解决方案

Comprehend is capable of extracting and categorizing text from your document. Comprehend 能够从文档中提取和分类文本。 You can use Comprehend's Custom Entity Recognition.您可以使用 Comprehend 的自定义实体识别。

For this, you will provide annotated training data as input.为此，您将提供带注释的训练数据作为输入。 You can leverage Ground Truth in Amazon SageMaker to do the annotations, and directly provide Ground Truth output to Comprehend Entity Recognition Training job.您可以利用 Amazon SageMaker 中的 Ground Truth 进行注释，并直接提供 Ground Truth output 来理解实体识别训练作业。 You can also provide your own annotations file for the training job - https://docs.aws.amazon.com/comprehend/latest/dg/API_EntityRecognizerInputDataConfig.html .您还可以为训练作业提供自己的注释文件 - https://docs.aws.amazon.com/comprehend/latest/dg/API_EntityRecognizerInputDataConfig.html 。

The relevant APIs for Amazon Comprehend would be - Amazon Comprehend 的相关 API 将是 -

Training - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html培训 - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html
Async Inference - https://docs.aws.amazon.com/comprehend/latest/dg/API_StartEntitiesDetectionJob.html OR Sync Inference Over Custom Endpoint - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html Async Inference - https://docs.aws.amazon.com/comprehend/latest/dg/API_StartEntitiesDetectionJob.html OR Sync Inference Over Custom Endpoint - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer .html

Here is a detailed example of how to train custom entity recognizers with Amazon Comprehend - https://docs.aws.amazon.com/comprehend/latest/dg/training-recognizers.html以下是如何使用 Amazon Comprehend 训练自定义实体识别器的详细示例 - https://docs.aws.amazon.com/comprehend/latest/dg/training-recognizers.html

Annotation file example for this use-case.此用例的注释文件示例。

File文件	Line线	Begin Offset开始偏移	End Offset结束偏移	Type类型
doc1文档1	3 3	0 0	2 2	Width宽度
doc1文档1	3 3	5 5	6 6	Ratio比率
doc1文档1	3 3	9 9	10 10	Diameter直径
doc1文档1	0 0	12 12	20 20	Brand牌
doc1文档1	0 0	6 6	6 6	Quantity数量
doc1文档1	6 6	8 8	10 10	Price价格
doc1文档1	1 1	20 20	22 22	Condition健康）状况
doc1文档1	0 0	42 42	48 48	Season季节
doc2文档2	0 0	45 45	48 48	Quantity数量
doc2文档2	1 1	78 78	79 79	Price价格