Trying to read text from image using pytesseract but getting blank

Question

I've taken a few pictures, and am using openCV to crop these images so i only have the relevant text. This is the picture i've taken (ie the cropped photo): 裁剪图像

I try to feed this image to the image_to_string function of pytesseract but when i print the output this is what i get

text from cropped image from code is '
♀ '

Any help as to how i can get the exact reading. Tried using

text2 = pytesseract.image_to_string(cropped_image) ,config='--psm 6')

but this gives a garbage value

Answer 1

lCould you please try with a different psm config? Please note you dont have to close the cropped image with a parenthesis as you did.

text2 = pytesseract.image_to_string(cropped_image, config='--psm 3')

You could aslo try adding "en" method just for extra testing like below

text2 = pytesseract.image_to_string(cropped_image, lang='eng', config='--psm 3')

Answer 2

I was able to get a better result with a little preprocessing.

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
cv2_imshow(gray)
th2 = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
            cv2.THRESH_BINARY,17,6)
cv2_imshow(th2)
kernel = np.ones((5,5),np.uint8)
closing = cv2.morphologyEx(th2, cv2.MORPH_CLOSE, kernel,iterations=1)
cv2_imshow(closing)

erosion = cv2.erode(closing,np.ones((5,5),np.uint8),iterations = 1)
cv2_imshow(erosion)

custom = '--psm 6'
txt = pytesseract.image_to_string(erosion, config=custom, lang='eng')
print(txt)

I cropped your image to remove the unnecessary black borders and tried adaptive thresholding followed by some morphological operations. Here is the result

You can play with the adaptive thresholding and morphological transformations to get accurate results. The results would be accurate if it is possible to remove the green color noise from the image(subtract background from image) or even apply gamma correction to make only the text visible. Pre-processing is the main thing to get accurate results.

Answer 3

Tarun Chakitha is right, you'll need some pre-processing, thresholding, and morphological transformations to get reliable results. The following code produces Pac=2666. 1W Pac=2666. 1W

# Obtain binary image
img_bgr = cv2.imread("3CxLj.jpg")
img_gray = cv2.cvtColor(img_bgr[90:200, 0:495], cv2.COLOR_BGR2GRAY)
img_bin = cv2.adaptiveThreshold(
    img_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 15
)
fig, axs = plt.subplots(3)
axs[0].imshow(img_gray, cmap="gray")
axs[1].imshow(img_bin, cmap="gray")

# Merge dots into characters using erosion
kernel = np.ones((5, 5), np.uint8)
img_eroded = cv2.erode(img_bin, kernel, iterations=1)
axs[2].imshow(img_eroded, cmap="gray")
fig.show()

# Obtain string using psm 8 (treat the image as a single word)
ocr_string = pytesseract.image_to_string(img_eroded, config="--psm 8")
print(ocr_string)

Trying to read text from image using pytesseract but getting blank

Question

3 answers

solution1
1 2021-04-25 18:09:50

solution2
0 2021-04-25 21:36:18

solution3
0 ACCPTED 2021-04-25 22:49:00

Trying to read text from image using pytesseract but getting blank

Question

3 answers

solution1 1 2021-04-25 18:09:50

solution2 0 2021-04-25 21:36:18

solution3 0 ACCPTED 2021-04-25 22:49:00

solution1
1 2021-04-25 18:09:50

solution2
0 2021-04-25 21:36:18

solution3
0 ACCPTED 2021-04-25 22:49:00