[英]Why can't get string with PIL and pytesseract?
It is a simple Optical Character Recognition (OCR) program in Python 3 to get string, I have uploaded the target gif file here, please download it and save it as /tmp/target.gif
.这是 Python 3 中一个简单的光学字符识别 (OCR) 程序来获取字符串,我在这里上传了目标 gif 文件,请下载并保存为
/tmp/target.gif
。
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('/tmp/target.gif')))
I paste all the error info here, please fix it to get the characters from image.我在这里粘贴了所有错误信息,请修复它以从图像中获取字符。
/usr/lib/python3/dist-packages/PIL/Image.py:925: UserWarning: Couldn't allocate palette entry for transparency
"for transparency")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 309, in image_to_string
}[output_type]()
File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 308, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 208, in run_and_get_output
temp_name, input_filename = save_image(image)
File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 136, in save_image
image.save(input_file_name, format=img_extension, **image.info)
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1728, in save
save_handler(self, fp, filename)
File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 407, in _save
_get_local_header(fp, im, (0, 0), flags)
File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 441, in _get_local_header
transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
I convert it with convert
command in bash.我用 bash 中的
convert
命令转换它。
convert "/tmp/target.gif" "/tmp/target.jpg"
I show /tmp/target.gif
and /tmp/target.jpg
here.我在这里显示
/tmp/target.gif
和/tmp/target.jpg
。
Then execute the above python code again.然后再次执行上面的python代码。
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('/tmp/target.jpg')))
Nothing can i get with the pytesseract.image_to_string(Image.open('/tmp/target.jpg'))
,i get blank character. pytesseract.image_to_string(Image.open('/tmp/target.jpg'))
我什么也得不到,我得到了空白字符。
For Trenton_M's code:
对于 Trenton_M 的代码:
>>> img1 = remove_noise_and_smooth(r'/tmp/target.jpg')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in remove_noise_and_smooth
AttributeError: 'NoneType' object has no attribute 'astype'
Thalish Sajeed
For Thalish Sajeed's code:对于 Thalish Sajeed 的代码:
Omit the error info caused by print(pytesseract.image_to_string(Image.open(filename)))
.省略由
print(pytesseract.image_to_string(Image.open(filename)))
引起的错误信息。
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
>>> import pytesseract
>>> import matplotlib.pyplot as plt
>>> import cv2
>>> import numpy as np
>>>
>>>
>>> def display_image(filename, length_box=60, width_box=30):
... if type(filename) == np.ndarray:
... image = filename
... else:
... image = cv2.imread(filename)
... plt.figure(figsize=(length_box, width_box))
... plt.imshow(image, cmap="gray")
...
>>>
>>> filename = r"/tmp/target.jpg"
>>> display_image(filename)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in display_image
File "/usr/local/lib/python3.5/dist-packages/matplotlib/pyplot.py", line 2699, in imshow
None else {}), **kwargs)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/__init__.py", line 1810, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_axes.py", line 5494, in imshow
im.set_data(X)
File "/usr/local/lib/python3.5/dist-packages/matplotlib/image.py", line 634, in set_data
raise TypeError("Image data cannot be converted to float")
TypeError: Image data cannot be converted to float
>>>
@Thalish Sajeed,Why i got 9244K
instead of 0244k
with your code? @Thalish Sajeed,为什么我的代码是
9244K
而不是0244k
? Here is my tested sample file.这是我测试过的示例文件。
@Trenton_M,correct a little typo and loss in your code,and delete the line plt.show()
as your suggestion. @Trenton_M,更正代码中的一些错字和丢失,并根据您的建议删除
plt.show()
行。
>>> import cv2,pytesseract
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>>
>>>
>>> def image_smoothening(img):
... ret1, th1 = cv2.threshold(img, 88, 255, cv2.THRESH_BINARY)
... ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
... blur = cv2.GaussianBlur(th2, (5, 5), 0)
... ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
... return th3
...
>>>
>>> def remove_noise_and_smooth(file_name):
... img = cv2.imread(file_name, 0)
... filtered = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 41)
... kernel = np.ones((1, 1), np.uint8)
... opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
... closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
... img = image_smoothening(img)
... or_image = cv2.bitwise_or(img, closing)
... return or_image
...
>>>
>>> cv2_thresh_list = [cv2.THRESH_BINARY, cv2.THRESH_TRUNC, cv2.THRESH_TOZERO]
>>> fn = r'/tmp/target.jpg'
>>> img1 = remove_noise_and_smooth(fn)
>>> img2 = cv2.imread(fn, 0)
>>> for i, img in enumerate([img1, img2]):
... img_type = {0: 'Preprocessed Images\n',
... 1: '\nUnprocessed Images\n'}
... print(img_type[i])
... for item in cv2_thresh_list:
... print('Thresh: {}'.format(str(item)))
... _, thresh = cv2.threshold(img, 127, 255, item)
... plt.imshow(thresh, 'gray')
... f_name = '{0}.jpg'.format(str(item))
... plt.savefig(f_name)
... print('OCR Result: {}\n'.format(pytesseract.image_to_string(f_name)))
... Preprocessed Images ... 预处理图像
In my console ,all the output info are as following:在我的控制台中,所有输出信息如下:
Thresh: 0
<matplotlib.image.AxesImage object at 0x7fbc2519a6d8>
OCR Result: 10
15
20
Edfifi
10
2 o 30 40 so
so
Thresh: 2
<matplotlib.image.AxesImage object at 0x7fbc255e7eb8>
OCR Result: 10
15
20
Edfifi
10
2 o 30 40 so
so
Thresh: 3
<matplotlib.image.AxesImage object at 0x7fbc25452fd0>
OCR Result: 10
15
20
Edfifi
10
2 o 30 40 so
so
Unprocessed Images
Thresh: 0
<matplotlib.image.AxesImage object at 0x7fbc25464c88>
OCR Result: 10
15
20
Thresh: 2
<matplotlib.image.AxesImage object at 0x7fbc254520f0>
OCR Result: 10
15
2o
2o
30 40 50
Thresh: 3
<matplotlib.image.AxesImage object at 0x7fbc1e1968d0>
OCR Result: 10
15
20
Where is the string 0244R
?字符串
0244R
在哪里?
Let's start with the JPG image, because pytesseract has issues operating on GIF image formats.让我们从 JPG 图像开始,因为 pytesseract 对 GIF 图像格式的操作存在问题。 reference
参考
filename = "/tmp/target.jpg"
image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray,55, 255, cv2.THRESH_BINARY)
print(pytesseract.image_to_string(threshold))
Let's try to breakdown the issues here.让我们尝试分解这里的问题。
Your image is too noisy for tesseract engine to identify the letters, We use some simple image processing techniques such as grayscaling and thresholding to remove some noise from the image.您的图像噪声太大,tesseract 引擎无法识别字母,我们使用一些简单的图像处理技术,例如灰度和阈值处理来去除图像中的一些噪声。
Then when we send it to the OCR engine, we see that the letters are captured more accurately.然后当我们将它发送到 OCR 引擎时,我们看到字母被更准确地捕获。
You can find my notebook where I tested this out if you follow this github link如果你按照这个github 链接,你可以找到我测试过的笔记本
Edit - I have updated the notebook with some additional image cleaning techniques.编辑 - 我已经用一些额外的图像清理技术更新了笔记本。 The source image is too noisy for tesseract to work directly out of the box on the image.
源图像噪声太大,tesseract 无法直接在图像上开箱即用。 You need to use image cleaning techniques.
您需要使用图像清理技术。
You can vary the thresholding parameters or swap out gaussian blur for some other technique until you get your desired results.您可以改变阈值参数或将高斯模糊换成其他一些技术,直到获得所需的结果。
If you are looking to run OCR on noisy images - please check out commercial OCR providers such as google-cloud-vision .如果您希望在嘈杂的图像上运行 OCR - 请查看商业 OCR 提供商,例如google-cloud-vision 。 They provide 1000 OCR calls free per month.
他们每月免费提供 1000 次 OCR 呼叫。
First: make certain you've installed the Tesseract program (not just the python package)首先:确保你已经安装了Tesseract 程序(不仅仅是 python 包)
Jupyter Notebook of Solution : Only the image passed through remove_noise_and_smooth
is successfully translated with OCR. Jupyter Notebook of Solution : 只有通过
remove_noise_and_smooth
的图片才能通过OCR 成功翻译。
When attempting to convert image.gif, TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
is generated.尝试转换 image.gif 时,
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
生成TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
。
Rename image.gif to image.jpg, the TypeError
is generated将image.gif重命名为image.jpg,产生
TypeError
Open image.gif and 'save as' image.jpg, the output is blank, which means the text wasn't recognized.打开 image.gif 并“另存为”image.jpg,输出为空白,表示无法识别文本。
from PIL import Image
import pytesseract
# If you don't have tesseract executable in your PATH, include the following:
# your path may be different than mine
pytesseract.pytesseract.tesseract_cmd = "C:/Program Files (x86)/Tesseract-OCR/tesseract.exe"
imgo = Image.open('0244R_clean.jpg')
print(pytesseract.image_to_string(imgo))
Improve Accuracy of OCR using Image Preprocessing 使用图像预处理提高 OCR 的准确性
import cv2
import numpy as np
import matplotlib.pyplot as plt
def image_smoothening(img):
ret1, th1 = cv2.threshold(img, 88, 255, cv2.THRESH_BINARY)
ret2, th2 = cv2.threshold(th1, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
blur = cv2.GaussianBlur(th2, (5, 5), 0)
ret3, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
return th3
def remove_noise_and_smooth(file_name):
img = cv2.imread(file_name, 0)
filtered = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 41)
kernel = np.ones((1, 1), np.uint8)
opening = cv2.morphologyEx(filtered, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
img = image_smoothening(img)
or_image = cv2.bitwise_or(img, closing)
return or_image
cv2_thresh_list = [cv2.THRESH_BINARY, cv2.THRESH_TRUNC, cv2.THRESH_TOZERO]
fn = r'/tmp/target.jpg'
img1 = remove_noise_and_smooth(fn)
img2 = cv2.imread(fn, 0)
for i, img in enumerate([img1, img2]):
img_type = {0: 'Preprocessed Images\n',
1: '\nUnprocessed Images\n'}
print(img_type[i])
for item in cv2_thresh_list:
print('Thresh: {}'.format(str(item)))
_, thresh = cv2.threshold(img, 127, 255, item)
plt.imshow(thresh, 'gray')
f_name = '{}_{}.jpg'.format(i, str(item))
plt.savefig(f_name)
print('OCR Result: {}\n'.format(pytesseract.image_to_string(f_name)))
img1 will generate the following new images: img1 将生成以下新图像:
img2 will generate these new images: img2 将生成这些新图像:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.