使用 pytesseract 执行 OCR 时出错

Question

我想使用pytesseract。 这是我的代码。

import pytesseract 
from pdf2image import convert_from_path 

PDF_file = 'file.pdf'
text = '' 
pages = convert_from_path(PDF_file, 500)
pageText = str(((pytesseract.image_to_string(pages[0]))))

结果我收到了这个错误

回溯（最近一次调用）：文件“C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\site-packages\\pdf2image\\pdf2image.py”，第 409 行，在 pdfinfo_from_path proc = Popen (command, env=env, stdout=PIPE, stderr=PIPE) 文件 "C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\subprocess.py", line 854, in init self. _execute_child(args, executable, preexec_fn, close_fds, File "C:\\Users\\user\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 系统找不到指定的文件

在处理上述异常的过程中，又发生了一个异常：

回溯（最近一次调用）：文件“C:\\Users\\user\\Desktop\\projects\\pdfparser\\pdftest.py”，第 13 行，页数 = convert_from_path(PDF_file, 500) 文件“C:\\Users\\user\\AppData \\Local\\Programs\\Python\\Python38-32\\lib\\site-packages\\pdf2image\\pdf2image.py”，第 89 行，在 convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] 文件“C: \\Users\\user\\AppData\\Local\\Programs\\Python\\Python38-32\\lib\\site-packages\\pdf2image\\pdf2image.py”，第 430 行，在 pdfinfo_from_path 中引发 PDFInfoNotInstalledError（pdf2image.exceptions.PDFInfoNotInstalledError：无法获取页数。 poppler 是否已安装并在 PATH 中？

Answer 1

正如很多评论已经指出的那样，错误消息

PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: 无法获取页数。poppler 是否已安装并在 PATH 中？

准确地告诉您出了什么问题：未安装 Poppler。 请参阅README以获得该方面的帮助。

您会看到， pdf2image只是pdftoppm命令行实用程序的包装器。 在 Linux 上它是默认安装的，所以你不需要理会它，但在 Windows 上它不是。

使用 pytesseract 执行 OCR 时出错

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-27 16:26:36

使用 pytesseract 执行 OCR 时出错

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-27 16:26:36

解决方案1
1 已采纳 2020-02-27 16:26:36