简体   繁体   English

Amazon Comprehend 能否从分类广告中提取和分类数据

[英]Can Amazon Comprehend extract and categorizing data from classifieds

I have a large dataset from which I would like to extract and categorize specific elements.我有一个大型数据集,我想从中提取和分类特定元素。 Below is a most common example:下面是一个最常见的例子:

在此处输入图像描述

I would like to know if this is possible using Amazon Comprehend or maybe there are better tools to do that.我想知道这是否可以使用 Amazon Comprehend 或者有更好的工具来做到这一点。 I am not a developer and looking to hire someone to program this for me.我不是开发人员,并且希望聘请某人为我编程。 But I would like to understand conceptually if something like this feasible before I hire someone.但我想在雇用某人之前从概念上了解这样的事情是否可行。

Comprehend is capable of extracting and categorizing text from your document. Comprehend 能够从文档中提取和分类文本。 You can use Comprehend's Custom Entity Recognition.您可以使用 Comprehend 的自定义实体识别。

For this, you will provide annotated training data as input.为此,您将提供带注释的训练数据作为输入。 You can leverage Ground Truth in Amazon SageMaker to do the annotations, and directly provide Ground Truth output to Comprehend Entity Recognition Training job.您可以利用 Amazon SageMaker 中的 Ground Truth 进行注释,并直接提供 Ground Truth output 来理解实体识别训练作业。 You can also provide your own annotations file for the training job - https://docs.aws.amazon.com/comprehend/latest/dg/API_EntityRecognizerInputDataConfig.html .您还可以为训练作业提供自己的注释文件 - https://docs.aws.amazon.com/comprehend/latest/dg/API_EntityRecognizerInputDataConfig.html

The relevant APIs for Amazon Comprehend would be - Amazon Comprehend 的相关 API 将是 -

  1. Training - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html培训 - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html
  2. Async Inference - https://docs.aws.amazon.com/comprehend/latest/dg/API_StartEntitiesDetectionJob.html OR Sync Inference Over Custom Endpoint - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer.html Async Inference - https://docs.aws.amazon.com/comprehend/latest/dg/API_StartEntitiesDetectionJob.html OR Sync Inference Over Custom Endpoint - https://docs.aws.amazon.com/comprehend/latest/dg/API_CreateEntityRecognizer .html

Here is a detailed example of how to train custom entity recognizers with Amazon Comprehend - https://docs.aws.amazon.com/comprehend/latest/dg/training-recognizers.html以下是如何使用 Amazon Comprehend 训练自定义实体识别器的详细示例 - https://docs.aws.amazon.com/comprehend/latest/dg/training-recognizers.html

Annotation file example for this use-case.此用例的注释文件示例。

File文件 Line线 Begin Offset开始偏移 End Offset结束偏移 Type类型
doc1文档1 3 3 0 0 2 2 Width宽度
doc1文档1 3 3 5 5 6 6 Ratio比率
doc1文档1 3 3 9 9 10 10 Diameter直径
doc1文档1 0 0 12 12 20 20 Brand
doc1文档1 0 0 6 6 6 6 Quantity数量
doc1文档1 6 6 8 8 10 10 Price价格
doc1文档1 1 1 20 20 22 22 Condition健康)状况
doc1文档1 0 0 42 42 48 48 Season季节
doc2文档2 0 0 45 45 48 48 Quantity数量
doc2文档2 1 1 78 78 79 79 Price价格

The file doc1 should contain the text that you want to extract entities from.文件 doc1 应包含您要从中提取实体的文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM