简体   繁体   English

将手写笔记的图像转换为文本 [暂停]

[英]Transform an image of handwritten notes to text [on hold]

I have hundreds of images of handwritten notes.我有数百张手写笔记的图像。 They were written from different people but they are in sequence so you know that for example person1 wrote img1.jpg -> img100.jpg .它们是由不同的人写的,但它们是按顺序写的,所以你知道例如person1写了img1.jpg -> img100.jpg The style of handwriting varies a lot from person to person but there are parts of the notes which are always fixed, I imagine that could help an algorithm (it helps me.).笔迹风格因人而异,但笔记的某些部分始终是固定的,我想这可以帮助算法(它对我有帮助。)。

I tried tesseract and it failed pretty bad at recognizing the text.我尝试了tesseract ,但它在识别文本方面非常失败。 I'm thinking since each person has like 100 images is there an algorithm I can train by feeding it a small number of examples, like 5 or less and it can learn from that?我在想,因为每个人都有大约 100 张图像,我是否可以通过提供少量示例(例如 5 个或更少)来训练算法,并且可以从中学习? Or would it not be enough data?还是数据不够? From searching around it seems looks like I need to implement a CNN (eg this paper ).从四处搜索看来我需要实现一个CNN (例如本文)。

My knowledge of ai is limited though, is this something that I could still do using a library and some studying?虽然我对ai的了解有限,我仍然可以通过图书馆和一些学习来做到这一点吗? If so, what should I do going forward?如果是这样,我应该怎么做?

This is called OCR and there has been a progress.这被称为OCR ,并且已经取得了进展。 Actually, here is an example of how simple it is to parse an image file to text using tesseract :实际上,这是一个使用tesseract将图像文件解析为文本的简单示例:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract


def ocr_core(file):
    text = pytesseract.image_to_string(file)
    return text


print(ocr_core('sample.png'))

BUT

I am not very sure that it can recognize different types of handwriting.我不太确定它是否可以识别不同类型的笔迹。 You can give it a try yourself to find out.您可以自己尝试一下以找出答案。 If you want to try the python example you need to import tesseract but first things first to install tesseract on your OS and add it to your PATH .如果您想尝试python示例,您需要导入tesseract但首先要在您的操作系统上安装tesseract并将其添加到您的PATH

There are many OCRs out there and some perform better than others.市面上有很多 OCR,其中一些的性能比其他的好。 However, this is a field that has improved a lot recently with the Deep Neural Networks.然而,这是一个最近随着深度神经网络而得到很大改进的领域。 I would consider using a Cloud provider such as Azure, Google Cloud or Amazon.我会考虑使用云提供商,例如 Azure、Google Cloud 或 Amazon。 Your upload the image and they return the metadata.您上传图片,他们会返回元数据。

For instance: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/例如: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

If you don't want to use cloud services for any reason, I would consider using TensorFlow... but some knowledge is required:如果您出于任何原因不想使用云服务,我会考虑使用 TensorFlow... 但需要一些知识:

Tensorflow model for OCR Tensorflow model 用于 OCR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM