图像处理 - 旋转和光学字符识别

Question

大家早上好，

今天我想关注“C ++中的图像处理”这个主题。

到目前为止，我能够从图片中过滤掉所有嘈杂的东西，并将颜色更改为黑白。

但现在我有两个问题。

第一个问题 ：
您可以在下面看到该图片的屏幕截图。 找出如何旋转文本的最佳方法是什么。 最后，如果文本是水平的，那将是很好的。 有没有人有一个很好的链接或一个例子。

在此输入图像描述

第二个问题：
怎么继续？ 你认为我应该将图像发送到“光学字符识别器” （a）还是应该过滤掉每个字母（b） ？

如果答案是（a）什么是最小的ocr lib？ 到目前为止，我发现的所有库都似乎过于强大，很难在现有项目中实现。 （如gocr或tesseract）

如果答案是（b）将每个字母保存为自己的图像的最佳方法是什么？ Shoul我搜索一个白色像素，而不是从像素到像素，保存2D数组中的坐标？ 什么是字母“我”;）

感谢所有帮助我找到路的人！
对不起上面的奇怪英语。 我还是一个语言菜鸟 :-)

Answer 1

第一个问题中常见问题的名称是“偏斜修正”

在此输入图像描述

你可以谷歌（很多参考）。 这里有一篇好文章，展示了如何获得这个：

在此输入图像描述

一个简单的启动方法（但不如前面提到的那样）是执行主成分分析 ：

在此输入图像描述

Answer 2

对于你的第一个问题：

首先，删除任何不属于字母序列的嘈杂白色像素的“规格”。 温和的低通滤波器（像素颜色=周围像素的平均值），然后将像素值钳制为纯黑色或纯白色。 这应该摆脱你的图像和任何其他规格中“a”字符下面的小“点”。

现在搜索以下像素：

xMin = white pixel with the lowest  x value (white pixel closest to the left edge)
xMax = white pixel with the largest x value (white pixel closest to the right edge)
yMin = white pixel with the lowest  y value (white pixel closest to the top edge)
yMax = white pixel with the largest y value (white pixel closest to the bottom edge)

with these four pixel values, form a bounding box: Rect(xMin, yMin, xMax, yMax);
compute the area of the bounding box and find the center.

using the center of the bounding box, rotate the box by N degrees. (You can pick N: 1 degree would be an ok value).

Repeat the process of finding xMin,xMax,yMin,yMax and recompute the area

Continue rotating by N degrees until you've rotated K degrees.  Also rotate by -N degrees until you've rotated by -K degrees.  (Where K is the max rotation... say 30 degrees). At each step recompute the area of the bounding box.

产生具有最小面积的边界框的旋转可能是使与底边平行的字母（水平对齐）对齐的旋转。

Answer 3

您可以从底部测量每个白色像素的高度，并找出文本倾斜的程度。 这是一个非常简单的方法，但是当我尝试它时，它对我来说很好。

图像处理 - 旋转和光学字符识别

问题描述

3 个解决方案

解决方案1
4 已采纳 2011-04-10 12:54:27

解决方案2
1 2011-04-10 11:10:27

解决方案3
0 2011-04-10 11:11:36

图像处理 - 旋转和光学字符识别

问题描述

3 个解决方案

解决方案1 4 已采纳 2011-04-10 12:54:27

解决方案2 1 2011-04-10 11:10:27

解决方案3 0 2011-04-10 11:11:36

解决方案1
4 已采纳 2011-04-10 12:54:27

解决方案2
1 2011-04-10 11:10:27

解决方案3
0 2011-04-10 11:11:36