[英]Pytesseract doesn't recognize decimal points
I'm trying to read the text in this image that contains also decimal points and decimal numbers我正在尝试阅读此图像中还包含小数点和小数的文本
in this way:这样:
img = cv2.imread(path_to_image)
print(pytesseract.image_to_string(img))
and what I get is:我得到的是:
73-82
Primo: 50 —
I've tried to specify also the italian language but the result is pretty similar:我也尝试指定意大利语,但结果非常相似:
73-82 _
Primo: 50
Searching through other questions on stackoverflow I found that the reading of the decimal numbers can be improved by using a whitelist, in this case tessedit_char_whitelist='0123456789.'
在 stackoverflow 上搜索其他问题时,我发现可以通过使用白名单来改进十进制数的读取,在本例
tessedit_char_whitelist='0123456789.'
, but I want to read also the words in the image. ,但我也想阅读图像中的文字。 Any idea on how to improve the reading of decimal numbers?
关于如何提高十进制数的阅读的任何想法?
I would suggest passing tesseract every row of text as separate image.我建议将 tesseract 每一行文本作为单独的图像传递。
For some reason it seams to solve the decimal point issue...出于某种原因,它似乎解决了小数点问题......
cv2.threshold
.cv2.threshold
将图像从灰度转换为黑白。cv2.dilate
morphological operation with very long horizontal kernel (merge blocks across horizontal direction).cv2.dilate
形态学操作(跨水平方向合并块)。pytesseract
.pytesseract
。 Here is the code:这是代码:
import numpy as np
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # I am using Windows
path_to_image = 'image.png'
img = cv2.imread(path_to_image, cv2.IMREAD_GRAYSCALE) # Read input image as Grayscale
# Convert to binary using automatic threshold (use cv2.THRESH_OTSU)
ret, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Dilate thresh for uniting text areas into blocks of rows.
dilated_thresh = cv2.dilate(thresh, np.ones((3,100)))
# Find contours on dilated_thresh
cnts = cv2.findContours(dilated_thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] # Use index [-2] to be compatible to OpenCV 3 and 4
# Build a list of bounding boxes
bounding_boxes = [cv2.boundingRect(c) for c in cnts]
# Sort bounding boxes from "top to bottom"
bounding_boxes = sorted(bounding_boxes, key=lambda b: b[1])
# Iterate bounding boxes
for b in bounding_boxes:
x, y, w, h = b
if (h > 10) and (w > 10):
# Crop a slice, and inverse black and white (tesseract prefers black text).
slice = 255 - thresh[max(y-10, 0):min(y+h+10, thresh.shape[0]), max(x-10, 0):min(x+w+10, thresh.shape[1])]
text = pytesseract.image_to_string(slice, config="-c tessedit"
"_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-:."
" --psm 3"
" ")
print(text)
I know it's not the most general solution, but it manages to solve the sample you have posted.我知道这不是最通用的解决方案,但它设法解决了您发布的示例。
Please treat the answer as a conceptual solution - finding a robust solution might be very challenging.请将答案视为概念解决方案 - 找到一个强大的解决方案可能非常具有挑战性。
Results:结果:
Thresholder image after dilate:扩张后的阈值图像:
Output text: Output 文字:
7.3-8.2
Primo:50
You can easily recognize by down-sampling the image.您可以通过对图像进行下采样轻松识别。
If you down-sample by 0.5, result will be:如果您下采样 0.5,结果将是:
Now if you read:现在,如果您阅读:
7.3 - 8.2
Primo: 50
I got the result by using pytesseract 0.3.7 version ( current )我通过使用 pytesseract 0.3.7 版本(当前)得到了结果
Code:代码:
# Load the libraries
import cv2
import pytesseract
# Load the image
img = cv2.imread("s9edQ.png")
# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Down-sample
gry = cv2.resize(gry, (0, 0), fx=0.5, fy=0.5)
# OCR
txt = pytesseract.image_to_string(gry)
print(txt)
Explanation:解释:
The input-image contains a little bit of an artifact.输入图像包含一些人工制品。 You can see it on the right part of the image.
您可以在图像的右侧看到它。 On the other hand, the current image is perfect for OCR recognition.
另一方面,当前图像非常适合 OCR 识别。 You need to use the pre-preprocessing method when the data from the image is not visible or corrupted.
当图像中的数据不可见或损坏时,您需要使用预处理方法。 Please read the followings:
请阅读以下内容:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.