简体   繁体   中英

Tesseract gives no recognition results (Android studio; Java)

I am making an app on Android Studio with tesseract OCR. I made a code which should recognize text on images taken by phone camera. Problem: tesseract function getUTF8Text() gives no result AT ALL (null, despite picture being with text). Program does not give any errors.

I wondered about possible issues: 1. Maybe I integrated tesseract into my project not properly? (Compilator does not show any issues when using tesseract classes in code) 2. Maybe problem in code? (bad traineddata path??).

Main class: Code:

private TessOCR Tess; 

//after taking picture I call:
PictureCallback pictureCallback = new PictureCallback() {
    @Override
    public void onPictureTaken(byte[] data, Camera camera) {
        Bitmap bitmap = BitmapFactory.decodeByteArray(data, 0, data.length);
        String result = Tess.getOCRResult(bitmap);

        if (result != null) Log.i(TAG, result);
        else Log.i(TAG, "NO RESULT");
    }
};

TessOCR class for tesseract traineddata file finding or adding and text recognition (Constructor is only for finding traineddata file):

public class TessOCR {
public static final String PACKAGE_NAME = "com.example.dainius.ocr";
public static final String DATA_PATH = Environment
        .getExternalStorageDirectory().toString() + "/AndroidOCR/";
public static final String lang = "eng";

private static final String TAG = "OCR";
private TessBaseAPI mTess;

public TessOCR(AssetManager assetManager) {

    mTess = new TessBaseAPI();

    String[] paths = new String[] { DATA_PATH, DATA_PATH + "tessdata/" };

    for (String path : paths) {
        File dir = new File(path);
        if (!dir.exists()) {
            if (!dir.mkdirs()) {
                Log.v(TAG, "ERROR: Creation of directory " + path + " on sdcard failed");
                return;
            } else {
                Log.v(TAG, "Created directory " + path + " on sdcard");
            }
        }

    }

    if (!(new File(DATA_PATH + "tessdata/" + lang + ".traineddata")).exists()) {
        try {
            InputStream in = assetManager.open("tessdata/" + lang + ".traineddata");
            OutputStream out = new FileOutputStream(DATA_PATH
                    + "tessdata/" + lang + ".traineddata");

            byte[] buf = new byte[1024];
            int len;
            while ((len = in.read(buf)) > 0) {
                out.write(buf, 0, len);
            }
            in.close();
            out.close();

            Log.v(TAG, "Copied " + lang + " traineddata");
        } catch (IOException e) {
            Log.e(TAG, "Was unable to copy " + lang + " traineddata " + e.toString());
        }
    }

    mTess.setDebug(true);
    mTess.init(DATA_PATH, lang);
}

public String getOCRResult(Bitmap bitmap) {

    mTess.setImage(bitmap);
    String result = mTess.getUTF8Text();

    return result;
}

public void onDestroy() {
    if (mTess != null)
        mTess.end();
}
  • If this problem is caused by bad tesseract integration, please post a proper tutorial about how to integrate it, because every tutorial on the internet is different from each other, it's hard to understand how to properly do it.

I've worked with Tesseract (tess4j). Have you tried using an image with very clear text and completely monochrome/grayscale? I've found that when I'm trying to get Tesseract to read my images it's much more useful to spend time manipulating the image trying to make it easier for Tesseract.

If you still aren't able to get it to produce output and it isn't showing any errors, I'd go here and restart the Tesseract setup with the tutorial and follow all of their tips. It shouldn't be too difficult, the .dll's are extracted and loaded automatically. Just make sure your tessdata folder is in the correct spot (root directory) and you have all the .jar's (I think there's only 4 that you need, not all of them, but check the tutorial on tess4j.sourceforge.com) as compile-time libraries.

Taken from their website: "Images intended for OCR should have at least 200 DPI in resolution, typically 300 DPI, 1 bpp (bit per pixel) monochome or 8 bpp grayscale uncompressed TIFF or PNG format." To be honest, I haven't had much luck with Tesseract besides their PDF tools to scan easy-to-read high-resolution documents.

I didn't get it to work the first time either, for what it's worth.

The cause of my problem was that I did not as permission to write external storage. If anyone will try to apply this method to extract file from assets folder (got this method from this github project ), make sure you add permission to write external storage code line to your manifest ( AndroidManifest.xml file):

<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM