[英]FileNotFoundError while using the function convert_from_path() of the package pdf2image
I am trying to convert my pdf file into a png file using Python's library pdf2image .我正在尝试使用 Python 的库pdf2image将我的 pdf 文件转换为 png 文件。 I use the following code to convert my pdf file.我使用以下代码来转换我的 pdf 文件。
from pdf2image import convert_from_path, convert_from_bytes
pdf_file_path = './samples/my_pdf.pdf'
images = convert_from_path(pdf_file_path)
I want to do so in order to later convert my pdf file into string text using pytesseract .我想这样做是为了以后使用pytesseract将我的 pdf 文件转换为字符串文本。
The problem I keep getting is the following FileNotFound error even though the file is in the right path.即使文件在正确的路径中,我不断遇到的问题是以下 FileNotFound 错误。 Could anyone help me figure out what I am doing wrong?谁能帮我弄清楚我做错了什么?
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-9-0b7f9e29e79a> in <module>()
1 from pdf2image import convert_from_path, convert_from_bytes
2 pdf_file_path = './samples/my_pdf.pdf'
----> 3 images = convert_from_path(pdf_file_path)
C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\site-packages\pdf2image\pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt)
22 uid, args, parse_buffer_func = __build_command(['pdftoppm', '-r', str(dpi), pdf_path], output_folder, first_page, last_page, fmt)
23
---> 24 proc = Popen(args, stdout=PIPE, stderr=PIPE)
25
26 data, err = proc.communicate()
C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
707 c2pread, c2pwrite,
708 errread, errwrite,
--> 709 restore_signals, start_new_session)
710 except:
711 # Cleanup if the child failed starting.
C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
995 env,
996 os.fspath(cwd) if cwd is not None else None,
--> 997 startupinfo)
998 finally:
999 # Child is launched. Close the parent's copy of those pipe
FileNotFoundError: [WinError 2] The system cannot find the file specified
Sorry for the late reply.抱歉回复晚了。
After digging into the source code of pdf2image
, the error is caused by pdfinfo
, which is a *nix base command, inside the pdf2image
package.挖掘到的源代码之后pdf2image
,所述错误是由引起pdfinfo
,这是Unix和Linux基本命令,内侧pdf2image
包。 As a result, when you are using this package on windows with missing pdfinfo
command, it will cause the above error.因此,当您在缺少pdfinfo
命令的 windows 上使用此包时,会导致上述错误。
Code from pdf2image
:来自pdf2image
代码:
#inside __page_count() function
...
else:
proc = Popen(["pdfinfo", pdf_path], stdout=PIPE, stderr=PIPE)
...
From the code above, you can see that it called a subprocess of pdfinfo
to get the page count of the pdf file.从上面的代码可以看出,它调用了pdfinfo
一个子pdfinfo
来获取pdf文件的页数。
Download window version poppler tools from : http://blog.alivate.com.au/poppler-windows/从以下网址下载窗口版 poppler 工具: http : //blog.alivate.com.au/poppler-windows/
unzip it and add the location of bin (like C:\\somepath\\poppler-0.67.0_x86\\poppler-0.67.0\\bin) to your environment PATH.解压缩并将 bin 的位置(如 C:\\somepath\\poppler-0.67.0_x86\\poppler-0.67.0\\bin)添加到您的环境路径中。
restart your CMD and python virtualenv if you are openning如果您正在打开,请重新启动您的 CMD 和 python virtualenv
Try using the full path.尝试使用完整路径。
Ex:例如:
import os
basePath = os.path.dirname(os.path.realpath(__file__))
pdf_file_path = os.path.join(basePath, "samples/my_pdf.pdf")
images = convert_from_path(pdf_file_path)
If you using Google colab如果您使用 Google colab
Run a cell with the following command first:首先使用以下命令运行单元格:
!apt-get install poppler-utils
Here's a complete example notebook that installs deps, downloads an example PDF, and then uses pdf2image to convert it to an image for display.这是一个完整的示例笔记本,安装 deps,下载示例 PDF,然后使用 pdf2image 将其转换为图像以进行显示。
https://colab.research.google.com/drive/10doc9xwhFDpDGNferehBzkQ6M0Un-tYq https://colab.research.google.com/drive/10doc9xwhFDpDGNferehBzkQ6M0Un-tYq
I just had this issue while running Python 2.我在运行 Python 2 时遇到了这个问题。
After looking again, the pypi page specifically states that the code is not Python 2 compatible.再次查看后,pypi 页面明确指出该代码与 Python 2 不兼容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.