简体   繁体   English

pytesseract.image_to_string 似乎无法从图像中提取文本

[英]pytesseract.image_to_string doesn't seem to be able to extract text from the image

I am trying to extract text from the image, however, with the following code that I have tried on other images, it works but not on this image.我正在尝试从图像中提取文本,但是,使用我在其他图像上尝试过的以下代码,它可以工作,但不能在此图像上使用。 Is there an issue with the code?代码有问题吗?

Image trying extract text from:图像尝试从以下位置提取文本: 原图 Here is the code:这是代码:

import cv2
import pytesseract
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

try:
    from PIL import Image
except ImportError:
    import Image

# Import image, convert,resize and noise removal
img = cv2.imread("sample01.png", cv2.IMREAD_GRAYSCALE)
print('Dimension of image: {}'.format(img.ndim))
img = cv2.resize(img, None, fx=2, fy=2)
blur = cv2.GaussianBlur(img, (5, 5), 0)

# Apply adaptiveThreshold (Mean)
th2 = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imwrite('resize_adaptive_threshmean.png', th2)

# Apply Tesseract to detect words
print(pytesseract.image_to_string(Image.open('resize_adaptive_threshmean.png')))   
print("=========================================================")

Is there anything wrong with the code?代码有什么问题吗?

Well, you can use adaptive-thresholding好吧,您可以使用自适应阈值

import cv2
import numpy as np
import pytesseract

img = cv2.imread("ACtBA.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
flt = cv2.adaptiveThreshold(gry,
                            100, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY, 15, 16)
txt = pytesseract.image_to_string(flt)
print(txt)

The image will be:图像将是:

在此处输入图片说明

Result:结果:

Parking: You may park anywhere on the campus where there are no signs prohibiting par-
king. Keep in mind the carpool hours and park accordingly so you do not get blocked in the
afternoon

Under Schoo! Age Children.While we love the younger children, it can be disruptive and
inappropriate to have them on campus during school hours. There may be special times
that they may be invited or can accompany a parent volunteer, but otherwise we ask that
you adhere to our —_ policy for the benefit of the students and staff.

I test with different parameters, so I think the most suitable parameters are:我用不同的参数测试,所以我认为最合适的参数是:

maxValue = 100  # Display pixels greater than maxValue

blockSize=15. # size of neighbourhood area.

C=16  #  just a constant which is subtracted from the mean or weighted mean calculated.

 import cv2 import pytesseract import matplotlib.pyplot as plt import matplotlib.image as mpimg try: from PIL import Image except ImportError: import Image data = pytesseract.image_to_string(Image.open("sample01.png")) print(data)

? ?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM