简体   繁体   English

清洁图像以在Android上帮助tesseract

[英]Cleaning image to help tesseract on Android

I'm trying to extract digits from a sudoku board. 我正在尝试从数独板中提取数字。 after detecting the board, its corners and transforming, I'm left with a pretty lined up image of only the board. 在检测到木板,拐角并进行了变形之后,我剩下的只是一张排列整齐的木板图像。 Now I'm trying to recognize the digits using Tesseract android implementation, Tess-Two. 现在,我尝试使用Tesseract android系统Tess-Two识别数字。 I split the image to 9 parts by 我将图像分为9个部分

currentCell = undistortedThreshed.submat(rect);

where rect is the rectangle that surrounds the image. 其中rect是包围图像的矩形。

Now to the digits recognition. 现在到数字识别。

Some digits, like 4, it recognize perfectly. 一些数字(例如4)可以完美识别。 Some, mostly 6,7,8 are recognized as 0s or nothing. 一些(大部分为6,7,8)被识别为0或什么都不被识别。

I want to help tesseract as much as I can by cleaning the currentCell image. 我想通过清理currentCell图像来尽可能多地帮助tesseract。 at the moment it looks like this 目前看起来像这样 倒6 . (also tried without the Inverted thresholding). (也尝试了不使用反向阈值)。 I want to get rid of the white lines (the sudoku lines). 我想摆脱白线(数独线)。 I've tried something like this (taken from here ) 我已经尝试过类似的事情(从这里获取

Imgproc.Canny(currentCell, currentCell, 80, 90);
Mat lines = new Mat();
int threshold = 50;
int minLineSize = 5;
int lineGap = 20;

Imgproc.HoughLinesP(currentCell, lines, 1, Math.PI / 180,
        threshold, minLineSize, lineGap);
for (int x = 0; x < lines.cols() && x < 1; x++) {
    double[] vec = lines.get(0, x);
    double x1 = vec[0], y1 = vec[1], x2 = vec[2], y2 = vec[3];
    Point start = new Point(x1, y1);
    Point end = new Point(x2, y2);

    Core.line(currentCell, start, end, new Scalar(255), 10);

}

but it doesn't draw anything, I tried messing with the line's width and color, but still nothing. 但是它什么也没画,我试图弄乱线条的宽度和颜色,但还是一无所获。 Tried drawing the line on the large image, on the unthreshed image, nothing works.. 试图在大图像上绘制线条,在未脱粒的图像上绘制线条,但没有任何效果。

Any suggestions? 有什么建议么?

EDIT 编辑

For some reason, it can't seems to find any lines. 由于某种原因,它似乎找不到任何行。 This is what that image looks after applying canny to it 这就是该图像在应用canny后的外观 6后坎尼 but the HoughLines doesn't detect any lines. 但是HoughLines不会检测到任何行。 Tried both HoughLines and HoughLinesP with different values, as shown in the OpenCV documentation, but nothing works... Those are pretty obvious lines.. what am I doing wrong? 如OpenCV文档中所示,对HoughLinesHoughLinesP都尝试了不同的值,但没有任何效果。。。那些行很明显..我在做什么错? Thanks! 谢谢!

I ended up doing something different. 我最终做了一些不同的事情。

I used findContours to get the biggest contour, which is the digit. 我使用findContours来获取最大的轮廓,即数字。

Got its bounding box by using boundingRect . 通过使用boundingRect获得其边界框。

Extracted this using submat and voilla. 使用submat和voilla提取。 I got only the digit. 我只有数字。

Unfortunately, it seems to make no difference at all. 不幸的是,这似乎没有任何区别。 Tesseract still can't recognize the digits correctly. Tesseract仍然无法正确识别数字。 Sometimes it gives no result, sometimes, after dilating the digits it recognizes the 6 as 0. But that's an issue for another question. 有时,它没有任何结果,有时,在将数字相乘后,它会将6识别为0。但这是另一个问题。

This is an idea right off the top of my head: 这是我的主意:

Keep the code that computes the Hough Lines in the image. 将用于计算霍夫线的代码保留在图像中。 Which means you can get the lines corresponding to the grid. 这意味着您可以获得与网格相对应的线。

Now, simply draw those lines on the original image, but set the color to BLACK. 现在,只需在原始图像上绘制这些线,然后将颜色设置为BLACK。

Most of the white lines would now be covered with the newly-drawn black lines. 现在,大多数白线将被新绘制的黑线覆盖。 As Hough line positions are not exactly matching the actual lines, a few small dots of white might remain. 由于霍夫线的位置与实际线不完全匹配,因此可能会留下一些白色的小点。 Eliminating them via connected-components (and discarding the components that are too tiny) or even some morphological operations - taking care that the actual digit remains unaltered - could handle these imperfections. 通过连接的组件(并丢弃太小的组件)甚至某些形态学操作(请注意实际数字保持不变)消除它们可以解决这些缺陷。

Do try it out and let me know. 尝试一下,让我知道。 Hope this helps you. 希望这对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM