Parse online PDF file with Python and PDFMiner

Question

How can I parse an online PDF file with Python?

I just need the second line of the first page. I need to do this without downloading the file and I am using Python 3.5

I have tried something like this, but it didnt work: Using PDFMiner (Python) with online pdf files. Encode the url?

from pdfminer.pdfparser import PDFParser
import urllib.request
from io import StringIO
import io

url = 'url_with_the_pdf'

open = urllib.request.urlopen(url).read()

memoryFile = io.StringIO(open)

parser = PDFParser(memoryFile)

I get this error:

memoryFile = io.StringIO(open) TypeError: initial_value must be str or None, 
not bytes

Answer 1

In Python 3 use io.BytesIO , ie

memoryFile = io.BytesIO(open)

Details: https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

...import the io module and use io.StringIO or io.BytesIO for text and data respectively

Parse online PDF file with Python and PDFMiner

Question

1 answers

solution1
0 2019-12-24 11:27:14

Parse online PDF file with Python and PDFMiner

Question

1 answers

solution1 0 2019-12-24 11:27:14

solution1
0 2019-12-24 11:27:14