简体   繁体   中英

Trying to read text from image using pytesseract but getting blank

I've taken a few pictures, and am using openCV to crop these images so i only have the relevant text. This is the picture i've taken (ie the cropped photo):裁剪图像

I try to feed this image to the image_to_string function of pytesseract but when i print the output this is what i get

text from cropped image from code is '
♀ '

Any help as to how i can get the exact reading. Tried using

text2 = pytesseract.image_to_string(cropped_image) ,config='--psm 6') 

but this gives a garbage value

lCould you please try with a different psm config? Please note you dont have to close the cropped image with a parenthesis as you did.

text2 = pytesseract.image_to_string(cropped_image, config='--psm 3')

You could aslo try adding "en" method just for extra testing like below

text2 = pytesseract.image_to_string(cropped_image, lang='eng', config='--psm 3')

I was able to get a better result with a little preprocessing.

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
cv2_imshow(gray)
th2 = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
            cv2.THRESH_BINARY,17,6)
cv2_imshow(th2)
kernel = np.ones((5,5),np.uint8)
closing = cv2.morphologyEx(th2, cv2.MORPH_CLOSE, kernel,iterations=1)
cv2_imshow(closing)

erosion = cv2.erode(closing,np.ones((5,5),np.uint8),iterations = 1)
cv2_imshow(erosion)

custom = '--psm 6'
txt = pytesseract.image_to_string(erosion, config=custom, lang='eng')
print(txt)

I cropped your image to remove the unnecessary black borders and tried adaptive thresholding followed by some morphological operations. Here is the result在此处输入图像描述

You can play with the adaptive thresholding and morphological transformations to get accurate results. The results would be accurate if it is possible to remove the green color noise from the image(subtract background from image) or even apply gamma correction to make only the text visible. Pre-processing is the main thing to get accurate results.

Tarun Chakitha is right, you'll need some pre-processing, thresholding, and morphological transformations to get reliable results. The following code produces Pac=2666. 1W Pac=2666. 1W

# Obtain binary image
img_bgr = cv2.imread("3CxLj.jpg")
img_gray = cv2.cvtColor(img_bgr[90:200, 0:495], cv2.COLOR_BGR2GRAY)
img_bin = cv2.adaptiveThreshold(
    img_gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 15
)
fig, axs = plt.subplots(3)
axs[0].imshow(img_gray, cmap="gray")
axs[1].imshow(img_bin, cmap="gray")

# Merge dots into characters using erosion
kernel = np.ones((5, 5), np.uint8)
img_eroded = cv2.erode(img_bin, kernel, iterations=1)
axs[2].imshow(img_eroded, cmap="gray")
fig.show()

# Obtain string using psm 8 (treat the image as a single word)
ocr_string = pytesseract.image_to_string(img_eroded, config="--psm 8")
print(ocr_string)

图像灰度、二进制和侵蚀

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM