简体   繁体   English

无法在 Python 3.6 上安装 pdftotext,缺少 poppler

[英]Unable to install pdftotext on Python 3.6, missing poppler

How can I install pdftotext properly?如何正确安装 pdftotext?

I'm getting the error message below when installing pdftotext in Python 3.6.在 Python 3.6 中安装 pdftotext 时,我收到以下错误消息。 I also tried to install the package manually by downloading the zip file but still got the same error.我还尝试通过下载 zip 文件手动安装软件包,但仍然遇到相同的错误。

  pdftotext/pdftotext.cpp(4): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2     

I found some help in the Readme.md file in the pdftotext package :我在 pdftotext 包的 Readme.md 文件中找到了一些帮助:

1) Install OS Dependencies : 1)安装操作系统依赖:

on Debian, Ubuntu, and friends:在 Debian、Ubuntu 和朋友上:

sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

on Fedora, Red Hat, and friends:关于 Fedora、Red Hat 和朋友:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

2) Do the normal install : 2)进行正常安装:

pip install pdftotext

and it worked for me.它对我有用。

I've been trying to figure out how to install pdftotext on Win10 for a few days.几天来,我一直在试图弄清楚如何在 Win10 上安装 pdftotext。 Internet searches have given me nothing.互联网搜索没有给我任何东西。 So for those who need to know, here's installing pdftotext on Win10 with Anaconda.所以对于那些需要了解的人来说,这里是使用 Anaconda 在 Win10 上安装 pdftotext。 YMMV. YMMV。

Install Anaconda Python.安装蟒蛇 Python。 There are many articles on installing Anaconda, so I won't explore that here.关于安装 Anaconda 的文章很多,这里就不一一探讨了。

Try to run pip install pdftotext, you will get an error that the Microsoft Visual C++ is required.尝试运行 pip install pdftotext,您将收到需要 Microsoft Visual C++ 的错误。

Navigate in a browser to http://visualstudio.microsoft.com/downloads .在浏览器中导航到http://visualstudio.microsoft.com/downloads Under the Tools for Visual Studio 2019 tab download the Build Tools for Visual Studio 2019. You'll then install the tools by checking the C++ build tools option box and clicking Install.在 Visual Studio 2019 的工具选项卡下,下载 Visual Studio 2019 的构建工具。然后,您将通过选中 C++ 构建工具选项框并单击安装来安装这些工具。

You should now get the pip install to move past the VC++ error.您现在应该让 pip install 越过 VC++ 错误。 Unfortunately you'll now get the error “Cannot open include file: 'poppler/cpp/poppler-document.h'.不幸的是,您现在会收到错误消息“无法打开包含文件:'poppler/cpp/poppler-document.h'。 This is because you're missing the poppler libraries.这是因为您缺少 poppler 库。

Head back to the internets!回到互联网! You'll need poppler for windows.您将需要用于 windows 的 poppler。 At the time of this writing, your best option is http://blog.alivate.com.au/poppler-windows .在撰写本文时,您最好的选择是http://blog.alivate.com.au/poppler-windows Grab the latest binary, and uncompress it.获取最新的二进制文件,然后解压缩。 If you look at the error, pip is looking for the header file at {Anaconda3 directory}\include\poppler\cpp\poppler-document.h.如果您查看错误,则 pip 正在 {Anaconda3 目录}\include\poppler\cpp\poppler-document.h 中查找头文件。 So look in the archive you just unzipped.因此,请查看您刚刚解压缩的存档。 In the include folder, you'll see a poppler directory.在包含文件夹中,您将看到一个 poppler 目录。 If you go down into the cpp directory in there you'll find the poppler-document.h file.如果您进入其中的 cpp 目录,您会找到 poppler-document.h 文件。

I copied the entire poppler directory into the Anaconda3\include folder, so do that.我将整个 poppler 目录复制到 Anaconda3\include 文件夹中,所以这样做。

If you try to run pip install again, you'll still get a ton of errors!如果您尝试再次运行 pip install,您仍然会收到大量错误! But these are not any of the errors that you saw previously, instead this error is looking for a missing linked library, poppler-cpp.lib.但这些不是您之前看到的任何错误,而是此错误正在寻找缺少的链接库 poppler-cpp.lib。 A search through Conda installs on another machine found this file in the poppler package.通过在另一台机器上安装的 Conda 进行搜索,在 poppler 包中找到了这个文件。 So所以

conda install -c conda-forge poppler conda install -c conda-forge poppler

Which will install our poppler-cpp.lib file.这将安装我们的 poppler-cpp.lib 文件。 Then we can copy the file from its home at {Anaconda3 directory}\Library\lib\poppler-cpp.lib and paste it where pdftotext is expecting it at {Anaconda3 directory}\libs.然后我们可以从 {Anaconda3 目录}\Library\lib\poppler-cpp.lib 的主目录复制该文件,并将其粘贴到 pdftotext 期望它位于 {Anaconda3 目录}\libs 的位置。

If we do a pip install pdftotext again, there it is!如果我们再次执行 pip install pdftotext,就可以了! I'm sure someone will find a way to refine this a bit, but for now we have a working pdftotext Python library on Win10.我相信有人会找到一种方法来改进这一点,但现在我们在 Win10 上有一个工作的 pdftotext Python 库。

These directions can be found, with screenshots, at my blog https://coder.haus/2019/09/27/installing-pdftotext-through-pip-on-windows-10/可以在我的博客https://coder.haus/2019/09/27/installing-pdftotext-through-pip-on-windows-10/上找到这些说明和屏幕截图

Below command solved the problem for me.下面的命令为我解决了这个问题。

sudo apt-get install libpoppler-cpp-dev

https://blog.droidzone.in/2018/05/01/install-pdftotext-python-extension-error/ https://blog.droidzone.in/2018/05/01/install-pdftotext-python-extension-error/

对于 Mac 操作系统: brew install poppler

For Ubuntu users对于 Ubuntu 用户

sudo apt-get install libpoppler58=0.41.0-0ubuntu1 libpoppler-dev libpoppler-cpp-dev

worked for me为我工作

Simple solution for windows: windows的简单解决方案:

  1. Download the poppler zip file from http://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7zhttp://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7z下载 poppler zip 文件
  2. Download and install visual studio tools from https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15下载并安装 Visual Studio 工具
  3. Set the folder \poppler-0.68.0\bin to path in the environmental variables.将文件夹\poppler-0.68.0\bin设置为环境变量中的路径。

Thats it.而已。 Restart your environment eg could be jupyter notebook, vscode etc. Enjoy重新启动您的环境,例如可能是 jupyter notebook、vscode 等。享受

To install pdftotext on Windows 10, I tried to follow Jason Woods' answer.要在 Windows 10 上安装 pdftotext,我尝试遵循 Jason Woods 的回答。

I want to add to this answer, that it is necessary to have the "C++ Desktop applications development" package installed in Visual Studio.我想补充一下这个答案,必须在 Visual Studio 中安装“C++ 桌面应用程序开发”包。

Make sure to install the "C++ Build Tools" as well, as mentioned in Jason Woods' answer.确保也安装“C++ Build Tools”,如 Jason Woods 的回答中所述。

Follow the rest of his answer.按照他的其余答案。 Quick summary:快速总结:

  • install Anaconda Python安装蟒蛇 Python
  • in the Anaconda Prompt, type: conda install -c conda-forge poppler在 Anaconda Prompt 中,输入: conda install -c conda-forge poppler
  • now install the pdftotext package: pip install pdftotext现在安装 pdftotext 包: pip install pdftotext

It worked for me.它对我有用。 Thank you.谢谢你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM