简体   繁体   English

pytesseract.image_to_string()总是会产生一个错误

[英]pytesseract.image_to_string() always creates an error

I am doing a project to solve captcha using python.I am using pytesseract module for that.This script works well it also creates new image file by modifying it but always creates an error while interpretation of line text = pytesseract.image_to_string(Image.open(filename)) to extract text from new temporary created image file.I'm using the following script temporary image created for extraction of text 我正在做一个使用python解决验证码的项目。我正在使用pytesseract模块。此脚本效果很好,它还可以通过修改它来创建新的图像文件,但在解释行文本= pytesseract.image_to_string(Image.open时总是会产生错误(文件名))从新的临时创建的图像文件中提取文本。我正在使用以下脚本创建的临时图像来提取文本

# import the necessary packages
from PIL import Image
import pytesseract
import argparse
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", type=str, default="thresh",
help="type of preprocessing to be done")
args = vars(ap.parse_args())

# load the example image and convert it to grayscale
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# check to see if we should apply thresholding to preprocess the
# image
if args["preprocess"] == "thresh":
    gray = cv2.threshold(gray, 0, 255,
    cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

# make a check to see if median blurring should be done to remove
# noise
elif args["preprocess"] == "blur":
    gray = cv2.medianBlur(gray, 3)

# write the grayscale image to disk as a temporary file so we can
# apply OCR to it
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)

# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)

# show the output images
cv2.imshow("Image", image)
cv2.imshow("Output", gray)
cv2.waitKey(0)

 C:\\Users\\LENOVO\\Desktop\\ocr>python test.py -i image.jpg Traceback (most recent call last): File "test.py", line 44, in <module> text = pytesseract.image_to_string(Image.open(filename)) File "C:\\Python27\\lib\\site-packages\\pytesseract\\pytesseract.py", line 193, in image_to_string return run_and_get_output(image, 'txt', lang, config, nice) File "C:\\Python27\\lib\\site-packages\\pytesseract\\pytesseract.py", line 140, in run_and_get_output run_tesseract(**kwargs) File "C:\\Python27\\lib\\site-packages\\pytesseract\\pytesseract.py", line 111, in run_tesseract proc = subprocess.Popen(command, stderr=subprocess.PIPE) File "C:\\Python27\\lib\\subprocess.py", line 390, in __init__ errread, errwrite) File "C:\\Python27\\lib\\subprocess.py", line 640, in _execute_child startupinfo) WindowsError: [Error 2] The system cannot find the file specified 
This my problem and did lot of search in google but i can't find a proper solution for that. 这是我的问题,在Google中做了很多搜索,但是我找不到合适的解决方案。 Thank you 谢谢

Are you sure you installed the Tesseract software? 您确定安装了Tesseract软件吗? I was getting the exact same error you were, but once I installed Google Tesseract OCR from this link your exact script worked just fine for me and produced an output. 我遇到了与您完全相同的错误,但是一旦从该链接安装了Google Tesseract OCR,您的确切脚本对我来说就很好了,并产生了输出。 I tried for a while to solve the answer just in Python, but I didn't realize that this Python library is really just a wrapper. 我花了一段时间尝试只用Python解决问题,但是我没有意识到这个Python库实际上只是一个包装器。

You can read the documentation for the Python library, or go to the tesseract GitHub page for more information. 您可以阅读 Python库的文档 ,或访问tesseract GitHub页面以获取更多信息。

Prerequisites: 先决条件:

  • Python-tesseract requires python 2.5+ or python 3.x Python-tesseract需要python 2.5+或python 3.x
  • You will need the Python Imaging Library (PIL) (or the Pillow fork). 您将需要Python Imaging Library(PIL)(或Pillow fork)。 Under Debian/Ubuntu, this is the package python-imaging or python3-imaging. 在Debian / Ubuntu下,这是python-imaging或python3-imaging软件包。
  • Install Google Tesseract OCR https://github.com/tesseract-ocr/tesseract 安装Google Tesseract OCR https://github.com/tesseract-ocr/tesseract

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM