简体   繁体   中英

Pytesseract image_to_string no longer working?

The program so far reads in images uses opencv, modifies them so the text can be read more accurately and I'm at a part where I want to add each image's text (string) to a list. This is where something's not quite right.

I was able to use the image_to_string successfully in a previous script however now that script isn't working either. I'm on my work laptop using PyCharm in a conda environment. Here's my code, it's the very last for loop where the problem is occuring:

EDIT: I've tried running another script which uses image_to_string which had previously worked however now it doesn't. This confirms that the code isn't wrong, it must be how I'm set up with PyCharm and Anaconda. The conda environment is active and I have correctly linked the tesseract.exe I believe and I'm not sure what else to try

The error: C:\ProgramData\Anaconda3\envs\Screenshot_Reader\python.exe "C:/Users/****/PycharmProjects/Screenshot/venv/Scripts/Import 2.py" You chose: C:/Python/Testing/screenshots

file_path_variable = C:/Python/Testing/screenshots Traceback (most recent call last): File "C:/Users/****/PycharmProjects/Screenshot/venv/Scripts/Import 2.py", line 68, in text = pytesseract.image_to_string(img) File "C:\ProgramData\Anaconda3\envs\Screenshot_Reader\lib\site-packages\pytesseract\pytesseract.py", line 423, in image_to_string return { File "C:\ProgramData\Anaconda3\envs\Screenshot_Reader\lib\site-packages\pytesseract\pytesseract.py", line 426, in Output.STRING: lambda: run_and_get_output(*args), File "C:\ProgramData\Anaconda3\envs\Screenshot_Reader\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_output run_tesseract(**kwargs) File "C:\ProgramData\Anaconda3\envs\Screenshot_Reader\lib\site-packages\pytesseract\pytesseract.py", line 264, in run_tesseract raise TesseractError(proc.returncode, get_errors(error_string)) pytesseract.pytesseract.TesseractError: (1, 'Error opening data file ./eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.')

Process finished with exit code 1

import tkinter
from tkinter import filedialog
import os
import pytesseract
import cv2

pytesseract.pytesseract.tesseract_cmd = r'C:\ProgramData\Anaconda3\envs\Screenshot_Reader\Library\bin\tesseract.exe'
root = tkinter.Tk()
#
root.withdraw() #use to hide tkinter window

def search_for_file_path ():
    currdir = os.getcwd()
    tempdir = filedialog.askdirectory(parent=root, initialdir=currdir, title='Please select a directory')
    if len(tempdir) > 0:
        print("You chose: %s" % tempdir)
    return tempdir


file_path_variable = search_for_file_path()
print ("\nfile_path_variable = ", file_path_variable)


def load_images_from_folder(folder):
    images = []
    for filename in os.listdir(folder):
        img = cv2.imread(os.path.join(folder,filename))
        if img is not None:
            images.append(img)
    return images


list_of_images = load_images_from_folder(file_path_variable)

# test to see if images in string - it works!
# for pic in list_of_images:
#     cv2.imshow('imshow', pic)
#     cv2.waitKey(0)

def image_processing(pics_unprocessed):
    processed_images = []

    for img in pics_unprocessed:  # processing images so text can be more accurately read
        # upscaling
        img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
        # blurring
        img = cv2.medianBlur(img, 3)
        processed_images.append(img)

    return processed_images


after_processing = image_processing(list_of_images)

# test to see processed images in string - it works!
# for img in after_processed:
#     cv2.imshow('imshow', img)
#     cv2.waitKey(0)


list_of_text = []
for img in after_processing:  # converts to text and adds each string to list
    text = pytesseract.image_to_string(img) # line 68, where error occurs
    print(text)
    list_of_text.append(text)

找到了解决方案......我很傻 - 我需要做的就是通过 Anaconda 启动 PyCharm,而不是运行它

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM