简体   繁体   中英

FileNotFoundError while using the function convert_from_path() of the package pdf2image

I am trying to convert my pdf file into a png file using Python's library pdf2image . I use the following code to convert my pdf file.

from pdf2image import convert_from_path, convert_from_bytes
pdf_file_path = './samples/my_pdf.pdf'
images = convert_from_path(pdf_file_path)

I want to do so in order to later convert my pdf file into string text using pytesseract .

The problem I keep getting is the following FileNotFound error even though the file is in the right path. Could anyone help me figure out what I am doing wrong?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-9-0b7f9e29e79a> in <module>()
      1 from pdf2image import convert_from_path, convert_from_bytes
      2 pdf_file_path = './samples/my_pdf.pdf'
----> 3 images = convert_from_path(pdf_file_path)

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\site-packages\pdf2image\pdf2image.py in convert_from_path(pdf_path, dpi, output_folder, first_page, last_page, fmt)
     22     uid, args, parse_buffer_func = __build_command(['pdftoppm', '-r', str(dpi), pdf_path], output_folder, first_page, last_page, fmt)
     23 
---> 24     proc = Popen(args, stdout=PIPE, stderr=PIPE)
     25 
     26     data, err = proc.communicate()

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    707                                 c2pread, c2pwrite,
    708                                 errread, errwrite,
--> 709                                 restore_signals, start_new_session)
    710         except:
    711             # Cleanup if the child failed starting.

C:\Users\hamza.ameur\AppData\Local\Continuum\anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
    995                                          env,
    996                                          os.fspath(cwd) if cwd is not None else None,
--> 997                                          startupinfo)
    998             finally:
    999                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

Sorry for the late reply.

Reason

After digging into the source code of pdf2image , the error is caused by pdfinfo , which is a *nix base command, inside the pdf2image package. As a result, when you are using this package on windows with missing pdfinfo command, it will cause the above error.

Code from pdf2image :

#inside __page_count() function
    ...
    else:
        proc = Popen(["pdfinfo", pdf_path], stdout=PIPE, stderr=PIPE)
    ...

From the code above, you can see that it called a subprocess of pdfinfo to get the page count of the pdf file.

Solution

Download window version poppler tools from : http://blog.alivate.com.au/poppler-windows/

unzip it and add the location of bin (like C:\\somepath\\poppler-0.67.0_x86\\poppler-0.67.0\\bin) to your environment PATH.

restart your CMD and python virtualenv if you are openning

Try using the full path.

Ex:

import os
basePath = os.path.dirname(os.path.realpath(__file__))
pdf_file_path = os.path.join(basePath, "samples/my_pdf.pdf")
images = convert_from_path(pdf_file_path)

If you using Google colab

Run a cell with the following command first:

!apt-get install poppler-utils 

Here's a complete example notebook that installs deps, downloads an example PDF, and then uses pdf2image to convert it to an image for display.

https://colab.research.google.com/drive/10doc9xwhFDpDGNferehBzkQ6M0Un-tYq

I just had this issue while running Python 2.

After looking again, the pypi page specifically states that the code is not Python 2 compatible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM