Tesseract OCR德国特殊字符

Question

iam使用tesseract ocr在C ++中读取德语png图像，我遇到了一些特殊字符的问题，例如

ßäöü等。

我需要培训tesseract以阅读正确的书吗，或者需要做什么？

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

更新

SetConsoleOutputCP(1252);//changed to german.
SetConsoleCP(1252);//changed to german
wcout << "ÄÖÜ?ß" << endl;

// Open input image with leptonica library
Pix *image = pixRead("D:\\Images\\Document.png");
api->Init("D:\\TesseractBeispiele\\Tessaractbeispiel\\Tessaractbeispiel\\tessdata", "deu");
api->SetImage(image);
api->SetVariable("save_blob_choices", "T");
api->SetRectangle(1000, 3000, 9000, 9000);
api->Recognize(NULL);

// Get OCR result
wcout << api->GetUTF8Text());

更改“更新”下方的代码后，硬编码的变音符号将正确显示，但是图像中的文本不正确，我需要更改什么？

tesseract版本是3.0.2 leptonica版本是1.68

Answer 1

Tesseract可以识别Unicode字符。 您的控制台可能尚未配置为显示它们。

cmd.exe使用什么编码/代码页？

Windows命令行中的Unicode字符-如何？

Answer 2

i don't how to detect German the word from the image in windows environment. but i know how to detect German word to Linux environment. following code may get you some idea.

/*
 * word_OCR.cpp
 *
 *  Created on: Jun 23, 2016
 *      Author: root
 */

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <iostream>

using namespace std;

int main(int argc ,char **argv)
{
    Pix *image = pixRead(argv[1]);

    if (image == 0) {
        cout << "Cannot load input file!\n";
    }

    tesseract::TessBaseAPI tess;
// insted of the passing "eng" pass "deu".
    if (tess.Init("/usr/share/tesseract/tessdata", "deu")) {
            fprintf(stderr, "Could not initialize tesseract.\n");
            exit(1);
        }

    tess.SetImage(image);
    tess.Recognize(0);

    tesseract::ResultIterator *ri = tess.GetIterator();
    tesseract::PageIteratorLevel level = tesseract::RIL_WORD;

    if(ri!=0)
    {
        do {
            const char *word = ri->GetUTF8Text(level);

            cout << word << endl;

            delete []word;

        } while (ri->Next(level));


        delete []ri;
    }

}
one thing you have to take care that pass good resolution image then and then it works fine.

Tesseract OCR德国特殊字符

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-04-08 13:22:31

解决方案2
0 2016-06-24 07:40:21

Tesseract OCR德国特殊字符

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-04-08 13:22:31

解决方案2 0 2016-06-24 07:40:21

解决方案1
1 已采纳 2016-04-08 13:22:31

解决方案2
0 2016-06-24 07:40:21