[英]tess-two OCR not decoding correctly
I have followed the tutorials to get Tesseract and specifically tess-two and eyes-two installed and a part of my Android app. 我已经按照教程安装了Tesseract,特别是安装了tess-two和eyes-two和我的Android应用程序的一部分。
It runs, but the OCR text that is returned from baseApi.getUTF8Text();
它运行,但是从baseApi.getUTF8Text();
返回的OCR文本baseApi.getUTF8Text();
is complete gibberish. 是完全乱码。
BitmapFactory.Options options = new BitmapFactory.Options();
options.inSampleSize = 4;
Bitmap bmp = BitmapFactory.decodeFile(path , options);
receipt.setImageBitmap(bmp);
try {
ExifInterface exif = new ExifInterface(path);
int exifOrientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION , ExifInterface.ORIENTATION_NORMAL);
int rotate = 0;
switch (exifOrientation) {
case ExifInterface.ORIENTATION_ROTATE_90: rotate = 90; break;
case ExifInterface.ORIENTATION_ROTATE_180: rotate = 180; break;
case ExifInterface.ORIENTATION_ROTATE_270: rotate = 270; break;
}
if (rotate != 0) {
int w = bmp.getWidth();
int h = bmp.getHeight();
Matrix matrix = new Matrix();
matrix.preRotate(rotate);
bmp = Bitmap.createBitmap(bmp, 0, 0, w, h, matrix, false);
}
bmp = bmp.copy(Bitmap.Config.ARGB_8888, true);
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(DATA_PATH , "eng");
baseApi.setImage(bmp);
String OCRText = baseApi.getUTF8Text();
baseApi.end();
Log.i("OCR Text", "rotate " + rotate);
Log.i("OCR Text", "OCR ");
Log.i("OCR Text", OCRText);
Log.i("OCR Text", "=======================================================================================");
Photographing a check which has OCR characters returns 拍摄具有OCR字符的支票会退货
05-14 11:01:59.131: I/OCR Text(18199): rotate 90
05-14 11:01:59.131: I/OCR Text(18199): OCR
05-14 11:01:59.131: I/OCR Text(18199): 4— ‘ ‘
05-14 11:01:59.131: I/OCR Text(18199): \Dxfi ‘
05-14 11:01:59.131: I/OCR Text(18199): I W man"! no Accounv
05-14 11:01:59.131: I/OCR Text(18199): 1’
05-14 11:01:59.131: I/OCR Text(18199): my... «unblm m. mm.
05-14 11:01:59.131: I/OCR Text(18199): :~A
05-14 11:01:59.131: I/OCR Text(18199): «Ln.
05-14 11:01:59.131: I/OCR Text(18199): ‘ “w “IN. N I “H‘M‘
05-14 11:01:59.131: I/OCR Text(18199): mmnwnmw- .; k. '
05-14 11:01:59.131: I/OCR Text(18199): Wilt-run”. uni” nl
05-14 11:01:59.131: I/OCR Text(18199): mam. I
05-14 11:01:59.131: I/OCR Text(18199): =======================================================================================
Any advice on how to clean up and correct the OCR recognition? 关于如何清理和更正OCR识别的任何建议? device used is a Samsung Galaxy 7". 使用的设备是Samsung Galaxy 7”。
You could use something like 您可以使用类似
OCRText = OCRText.replaceAll("[^a-zA-Z0-9]+", " ");
OCRText = OCRText.trim();
which is based on the a Tesseract implementation I found here: SimpleAndroidOCRActivity.java 它基于我在这里找到的Tesseract实现: SimpleAndroidOCRActivity.java
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.