简体   繁体   中英

Unable to import pdftotext after installing with conda and poppler, Windows 10

I'm trying to use pdftotext, but it won't import.

I'm running Windows 10 (64 bit) on a Lenovo IdeaPad S340, a work laptop.

Following the directions here and here (which were super helpful), I:

  1. Installed Microsoft Visual C++ Build Tools.
  2. Installed Anaconda.
  3. Got the latest version of Anaconda and updated it, using a separate Anaconda3 commands for each of these steps. I don't recall the commands, and haven't found them again.
  4. Updated Microsoft Visual 14.
  5. Used conda to install poppler via Anaconda3 command: conda install -c conda-forge poppler
  6. Used pip to install pdftotext via Anaconda3 command: pip install pdftotext

After that:

This happens in the Python 3.8 (32 bit) command prompt:

>>> import pdftotext
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdftotext'
>>>

This happens in IDLE's Python 3.75 Shell (64 bit):

>>> import pdftotext
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import pdftotext
ModuleNotFoundError: No module named 'pdftotext'
>>> 

This happens in the Anaconda3 command prompt:

import pdftotext
'import' is not recognized as an internal or external command,
operable program or batch file.

This also happens in Anaconda3 command prompt:

pip install pdftotext
Requirement already satisfied: pdftotext in c:\programdata\anaconda3\lib\site-packages (2.1.4)

Does that mean it only runs in Python 2? How would I have checked that beforehand? If it does only run on Python 2, can you recommend a Python 3 package/module/library (what is the difference, btw?) for reading a PDF into a plain text file?

Thanks for your help!

Update:

I started over with a new user on the same machine and OS (the other user had a space in the name, so its filepath had a space, which can cause problems). I'm hitting the same problem.

I have Python 3.7.6 and 3.8.1. Python 3.7.6 is what shows up when checking the version through the Anaconda3 prompt python -V (3.7.6.final.0 when using conda info ).

I also have:

  • Anaconda Version "custom", Build py37_1.
  • conda 4.8.2, py37_0, Channel conda-forge.
  • poppler 0.84.0, h1affe6b_0, conda-forge.
  • pdftotext 2.1.4, pypi_0, pypi.

I found Python here: C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python37_64.

I searched with my eyes all over the program files, user files, and on the Anaconda Navigator, and I ran a search of my entire C drive for 'pdftotext', and I didn't find anything about pdftotext.

Attempting from IDLE's Python 3.7.6 shell didn't work either.

Update:

I figured it out, sorta. pdftotext is not working as a Python import, as the example code in PyPI uses it. But, it does work as a command line tool that is part of Xpdf , with no additional installation after the steps.

I used the command in the Anaconda3 PowerShell command prompt:

pdftotext C:\\filepath\\file.pdf

It then created a text file with the same name and saved it in the same folder. There are additional options for the command outlined on the Xpdf page I linked above (like setting your file name).

Buuuut , this is not a satisfying solution. I'm able to take care of my current use-case task, with an additional step, but I'm still not able to call pdftotext from within a Python program.

Update:

If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.

So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python

Then, I could import pdftotext with no error: import pdftotext

It looked like this:

(user)> python
Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdftotext
>>> 

pdftotext is command, not module. So if you want execute this command in Python then can write

import os

file_path = "C:\documents\mypdf.pdf"

# writing data in variable
text = os.popen("pdftotext {}".format(file_path)).read()

# writing data in file
os.system("pdftotext {} {}".format(file_path, "data.txt"))

Okay, I figured it out! If you install pdftotext using Anaconda and conda, then importing it seems to only work when you run it in the Python interpreter from within the Anaconda3 shell.

So, I had to switch to the Python interpreter mode in the Anaconda3 PowerShell first: python

Then, I could import pdftotext with no error: import pdftotext

It looked like this:

(user)> python
Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdftotext
>>> 

Ooor, a second partial solution is that it works as a command line tool that is part of Xpdf .

I needed no additional installation after the steps taken in the problem post. I used the command in the Anaconda3 PowerShell command prompt:

pdftotext C:\\filepath\\file.pdf

It then created a text file with the same name and saved it in the same folder. There are additional options for the command outlined on the Xpdf page I linked above (like setting your file name).

The problem with the second solution of using it from the command line is that if you want to do something with the text file afterwards, you have to run another command or script. All it does is read it to a file.

I had the same problem but after performing the following, it worked like charm!

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

pip install pdftotext

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM