简体   繁体   English

在Android上使用Tesseract TESS-Two拆分单词

[英]Split words with Tesseract tess-two on Android

I try to use tesseract tess-two to read question and answears from images in android. 我尝试使用tesseract tess-two从Android图像中读取问题和答案。 At the moment I get a String with every word on the image. 目前,我在图像上的每个单词都得到了一个字符串。 My problem is that I can't split the answears Is it possible to split the answear with TessBaseAPI? 我的问题是我无法拆分answears是否可以使用TessBaseAPI拆分answears? A solution in java/android would be also fine ;) java / android中的解决方案也可以;)

 public String detectText(Bitmap bitmap) {
    Log.d(TAG, "Initialization of TessBaseApi");
    TessDataManager.initTessTrainedData(context);
    TessBaseAPI tessBaseAPI = new TessBaseAPI();
    String path = TessDataManager.getTesseractFolder();
    Log.d(TAG, "Tess folder: " + path);
    tessBaseAPI.setDebug(true);
    tessBaseAPI.init(path, "eng");
    tessBaseAPI.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
            "abcdefghijklnmopqrstuvwxyzäüößÄÖÜ!?@#$%^&*+=-;()/");
    tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);

    Log.d(TAG, "Ended initialization of TessEngine");
    Log.d(TAG, "Running inspection on bitmap");
    tessBaseAPI.setImage(bitmap);

    String inspection = tessBaseAPI.getUTF8Text();
    Log.d(TAG, "Got data: " + inspection);
    tessBaseAPI.end();
    System.gc();
    return inspection;
}

这是一个图像的示例

这是它的工作方式:

tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SPARSE_TEXT);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM