简体繁体 English

AWS Comprehend 是否对图像进行分类？

[英]Does AWS Comprehend classify images?

原文 2020-04-06 15:42:15 0 3 amazon-web-services/ amazon-comprehend

I am fairly new to AWS Comprehend.我对 AWS Comprehend 还很陌生。 I know that AWS Comprehend can custom classify documents (Text Files).我知道 AWS Comprehend 可以自定义分类文档（文本文件）。 Does, AWS Comprehend also classify Image files? AWS Comprehend 是否也对图像文件进行分类？ Also, while training the model, is it necessary to give the entire document text in the CSV or will just keywords do?此外，在训练 model 时，是否有必要在 CSV 中提供整个文档文本，还是只使用关键字？

The reason being, I want to built a custom classifier that can classify invoice, Pay Stubs and few other such document types which are in image formats.原因是，我想构建一个自定义分类器，可以对发票、付款存根和其他一些图像格式的文档类型进行分类。 Can Comprehend do this?领悟能做到吗？ If so how?如果有怎么办？

Googled quite a lot but couldn't find anything much relevant around.谷歌搜索了很多，但找不到任何相关的东西。 Really appreciate your help with this.非常感谢您对此的帮助。

Thank you!谢谢！

3 个解决方案

Comprehend doesn't do this natively, so you would have to build a solution. Comprehend 本身并不这样做，因此您必须构建一个解决方案。 Something you could try is to combine Amazon Textract (for extracting the details from the documents) and then Comprehend to classify them.您可以尝试结合Amazon Textract （用于从文档中提取详细信息）然后 Comprehend 对它们进行分类。

From the FAQ, Textract calls out this as a common use case.在常见问题解答中，Textract 将此称为常见用例。 I couldn't find an exact example of someone doing this, but it is directly called out in the documentation .我找不到有人这样做的确切例子，但它直接在文档中被调用。

Amazon Comprehend only works on text. Amazon Comprehend 仅适用于文本。

Amazon Rekognition works on images. Amazon Rekognition 适用于图像。

AWS has all the building blocks to accomplish this, but you will have to configure/build this yourself. AWS 拥有完成此任务的所有构建块，但您必须自己配置/构建它。 You can use AWS Textract to extract all the text from a document, and then pass the text into the AWS Comprehend service to do the classification for document type.您可以使用 AWS Textract 从文档中提取所有文本，然后将文本传递到 AWS Comprehend 服务以对文档类型进行分类。

Before you can do this you need to train the machine learning part of Comprehend to do the correct identification of the document types.在执行此操作之前，您需要训练 Comprehend 的机器学习部分以正确识别文档类型。 You need to configure and train a custom classifier in AWS Comprehend where you supply a CSV file with a list of classifications for example 'document type' and then text that would be in the type of document.您需要在 AWS Comprehend 中配置和训练自定义分类器，在其中提供 CSV 文件，其中包含分类列表，例如“文档类型”，然后是文档类型中的文本。 If it is just forms then you can use Textract Form feature to only get key value pairs, then use the keys (labels in the form) as text for the custom classifier.如果只是 forms 那么您可以使用 Textract Form 功能仅获取键值对，然后使用键（表单中的标签）作为自定义分类器的文本。