How can I parse an online PDF file with Python?
I just need the second line of the first page. I need to do this without downloading the file and I am using Python 3.5
I have tried something like this, but it didnt work: Using PDFMiner (Python) with online pdf files. Encode the url?
from pdfminer.pdfparser import PDFParser
import urllib.request
from io import StringIO
import io
url = 'url_with_the_pdf'
open = urllib.request.urlopen(url).read()
memoryFile = io.StringIO(open)
parser = PDFParser(memoryFile)
I get this error:
memoryFile = io.StringIO(open) TypeError: initial_value must be str or None,
not bytes
In Python 3 use io.BytesIO
, ie
memoryFile = io.BytesIO(open)
Details: https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
...import the
io
module and useio.StringIO
orio.BytesIO
for text and data respectively
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.