简体   繁体   English

如何在iOS中使用Tesseract OCR获得准确的文本?

[英]How do I get accurate text using Tesseract OCR in iOS?

I am working on iPhone application.Here I need to get text from the images, after googling I found Tesseract can do that.Its working fine but not getting accurate results.I used this and processed the image but still not getting good results. 我正在使用iPhone应用程序。在这里我需要从图像中获取文本,在谷歌搜索之后,我发现Tesseract可以做到这一点 ,它可以正常工作,但无法获得准确的结果。我使用并处理了图像,但仍然没有获得良好的结果。

Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
UIImage *selectedImage=[UIImage imageNamed:@"download.jpg"];
[tesseract setImage:selectedImage];

ImageWrapper *greyScale=Image::createImage(selectedImage, selectedImage.size.width+100, selectedImage.size.height+100);
ImageWrapper *edges = greyScale.image->autoLocalThreshold();
[tesseract setImage:edges.image->toUIImage()];
[tesseract recognize];
NSLog(@"%@", [tesseract recognizedText]);

I used below image for testing.But I am getting results like .-|llIAT&T JG H109 PM ED ' '» "rr ~ ' ma» mania-J 'E, 'M, 4 ., -_ \\ ~ \\ Download Image 53.0 KB \\ _11.04 PM | Hey | am in buenos aires right 'now. Check out this mm phfllu 111:5 PM |' lam in Budapest on WiF. n is \\ maePMu 001d here. ; l 1 . , ' l, . 11.05 PM u, .——; _ | Nice picture. Let me send you an audio nuke. _11 08PM 我使用下面的图像进行测试。但是,我得到的结果如.-|llIAT&T JG H109 PM ED ' '» "rr ~ ' ma» mania-J 'E, 'M, 4 ., -_ \\ ~ \\ Download Image 53.0 KB \\ _11.04 PM | Hey | am in buenos aires right 'now. Check out this mm phfllu 111:5 PM |' lam in Budapest on WiF. n is \\ maePMu 001d here. ; l 1 . , ' l, . 11.05 PM u, .——; _ | Nice picture. Let me send you an audio nuke. _11 08PM .-|llIAT&T JG H109 PM ED ' '» "rr ~ ' ma» mania-J 'E, 'M, 4 ., -_ \\ ~ \\ Download Image 53.0 KB \\ _11.04 PM | Hey | am in buenos aires right 'now. Check out this mm phfllu 111:5 PM |' lam in Budapest on WiF. n is \\ maePMu 001d here. ; l 1 . , ' l, . 11.05 PM u, .——; _ | Nice picture. Let me send you an audio nuke. _11 08PM

How to solve the above issue.If any one worked on it please guide me.Thanks in advance. 如何解决上述问题。如果有任何问题,请指导我。谢谢。

在此处输入图片说明

I tried to recognise your image with ABBYY Cloud OCR SDK and decided to share result with you. 我试图使用ABBYY Cloud OCR SDK识别您的图像,并决定与您分享结果。 I think its rather accurate: 我认为它相当准确: 苹果手机

You can try demo recognition here: http://cloud.ocrsdk.com/demo (its a marketing tool without opportunity to extract data). 您可以在此处尝试演示识别: http : //cloud.ocrsdk.com/demo (它是一种营销工具,没有机会提取数据)。

I work for ABBYY and ready to help you. 我为ABBYY工作,并准备为您提供帮助。 Just let me know in comments. 请在评论中让我知道。

I tried it to recognise my image with ABBYY Cloud OCR SDK. 我尝试使用ABBYY Cloud OCR SDK识别我的图像。

Here to solved like this , I tried to extract text and export it in XML format. 为了解决这个问题,我尝试提取文本并将其导出为XML格式。 This format contains recognized text, with structure and parameters which are described with the help of XML. 这种格式包含可识别的文本,其结构和参数在XML的帮助下进行了描述。 The par tag corresponces to one paragraph of a recognized text. par标记对应于已识别文本的一个段落。 After getting the text from XML you could work with it as you want. 从XML获取文本后,您可以根据需要使用它。

I processed chat screen shots with the following settings: 我使用以下设置处理了聊天截屏:

"…/processImage?language=English&profile=documentConversion&exportFormat=xml"

and got the attached XML files. 并获得了附加的XML文件。 These images are processed correctly, each dialog block is detected as separate paragraph. 这些图像已正确处理,每个对话框块被检测为单独的段落。

Hope the information is helpful. 希望这些信息对您有所帮助。

Thanks to Abbyy OCR SDK team for providing solution. 感谢Abbyy OCR SDK团队提供的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM