I am trying to extract text from pdf file using slate
module, as shown in this
$sudo pip install https://codeload.github.com/timClicks/slate/zip/master
Collecting https://codeload.github.com/timClicks/slate/zip/master
Downloading https://codeload.github.com/timClicks/slate/zip/master
Requirement already satisfied: distribute in /usr/lib/python3.5/site-packages (from slate==0.5.2)
Requirement already satisfied: pdfminer3k in /usr/lib/python3.5/site-packages (from slate==0.5.2)
Requirement already satisfied: setuptools>=0.7 in /usr/lib/python3.5/site-packages (from distribute->slate==0.5.2)
Requirement already satisfied: pytest>=2.0 in /usr/lib/python3.5/site-packages (from pdfminer3k->slate==0.5.2)
Requirement already satisfied: ply>=3.4 in /usr/lib/python3.5/site-packages (from pdfminer3k->slate==0.5.2)
Requirement already satisfied: py>=1.4.29 in /usr/lib/python3.5/site-packages (from pytest>=2.0->pdfminer3k->slate==0.5.2)
Installing collected packages: slate
Found existing installation: slate 0.3
Uninstalling slate-0.3:
Successfully uninstalled slate-0.3
Running setup.py install for slate ... done
Successfully installed slate-0.5.2
and I am trying:
#!/usr/bin/python3
import slate
with open('/var/tmp/PhysRevB.93.014203.pdf') as fp:
doc = slate.PDF(fp)
print(len(doc))
print(doc[0])
which is giving me error:
$python3 tstslt.py
Traceback (most recent call last):
File "tstslt.py", line 2, in <module>
import slate
File "/usr/lib/python3.5/site-packages/slate/__init__.py", line 66, in <module>
from .classes import PDF
File "/usr/lib/python3.5/site-packages/slate/classes.py", line 25, in <module>
import utils
ImportError: No module named 'utils'
I can extract the text using PyPDF2
, but looking if slate is better.
According to this issue one of slate's dependecies (pdfminer) doesn't support Python3
(...)
The "pdfminer" that is required does not work because it is currently incompatible with python 3.5.
It says so on their readme:
https://github.com/euske/pdfminer
"Install Python 2.6 or newer. (Python 3 is not supported.)"
After You installed the slate3k ,You also have to setup the mode, how to open the file:
#/usr/bin/python3
import slate
with open('/var/tmp/PhysRevB.93.014203.pdf', 'rb') as fp:
doc = slate.PDF(fp)
print(len(doc))
print(doc[0])
Just install pip install utils`` after you installed
pip install https://github.com/timClicks/slate/archive/master.zip
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.