I am new to pytesseract and OCR and I searched on the internet that this are the tools that is used to extract text from images. But, I have no prior knowledge of this tool. Right now, I am having this error: tesseract is not installed or it's not in your PATH. See README file for more information.
I don't know how to resolve this and I tried various solutions that I found on internet, which unfortunately didn't worked.
The error code:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
~/.local/lib/python3.9/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
254 try:
--> 255 proc = subprocess.Popen(cmd_args, **subprocess_args())
256 except OSError as e:
/opt/conda/lib/python3.9/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
950
--> 951 self._execute_child(args, executable, preexec_fn, close_fds,
952 pass_fds, cwd, env,
/opt/conda/lib/python3.9/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
1822 err_msg = os.strerror(errno_num)
-> 1823 raise child_exception_type(errno_num, err_msg, err_filename)
1824 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'
During handling of the above exception, another exception occurred:
TesseractNotFoundError Traceback (most recent call last)
<ipython-input-7-96e86f1cd397> in <module>
1 img = cv2.imread("Z++¦hler NSHV KTL-Durchlaufanlage-1.jpg")
----> 2 data = pytesseract.image_to_string(img)
3 print(data)
4 # plt.imshow(img)
~/.local/lib/python3.9/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
407 args = [image, 'txt', lang, config, nice, timeout]
408
--> 409 return {
410 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
411 Output.DICT: lambda: {'text': run_and_get_output(*args)},
~/.local/lib/python3.9/site-packages/pytesseract/pytesseract.py in <lambda>()
410 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
411 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 412 Output.STRING: lambda: run_and_get_output(*args),
413 }[output_type]()
414
~/.local/lib/python3.9/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
285 }
286
--> 287 run_tesseract(**kwargs)
288 filename = kwargs['output_filename_base'] + extsep + extension
289 with open(filename, 'rb') as output_file:
~/.local/lib/python3.9/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
257 if e.errno != ENOENT:
258 raise e
--> 259 raise TesseractNotFoundError()
260
261 with timeout_manager(proc, timeout) as error_string:
TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
Corresponding code:
!pip install tesseract
import pytesseract
import cv2
from PIL import Image
import matplotlib.pyplot as plt
img = cv2.imread("meter.jpg")
data = pytesseract.image_to_string(img)
print(data)
# plt.imshow(img)
Let me first tell you that I am using Jupyterhub. Actually, I made an account on my university's jupyterhub. Additionally, I searched on net where one can use 'cmd' and resolve the problem. If so, then please brief me how to do so or I have to contact the Uni admin to solve this problem. Any help is appreciated!
Possible cause of this error is that you installed pytesseract
with pip
without installing the binary. If that is the case, you can install it as following:
on linux:
sudo apt update
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
on windows: download it from here then insert the binary path into your code
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
on On Mac:
brew install tesseract
For Windows- in Case of the user have installed it for user only the path will be in the user folder Like: C:\Users\<User.Name>\AppData\Local\Tesseract-OCR\tesseract.exe
using same in code works fine
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\John.Doe\AppData\Local\Tesseract-OCR\tesseract.exe'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.