AttributeError: '_io.BufferedReader' object has no attribute 'page

Question

`I am trying to extract text from pdf file which consists of text, tables, and images. and want to save that file on local system. This was the code i was developing.

from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("SHKelkar.pdf", "rb") as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    total_pages = pdf_reader.numPages
    print(total_pages)
    print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
    for i in range(total_pages):
        page = pdf_file.page[i]
        textdata = page.extract_text()
        print(textdata)

Answer 1

you are extracting from pdf_file instead of pdf_reader :

check this below working code.

from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("sample.pdf", "rb") as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    total_pages = pdf_reader.getNumPages()
    print(total_pages)
    print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
    for i in range(total_pages):
        page = pdf_reader.getPage(i)
        textdata = page.extractText()
        print(textdata)

AttributeError: '_io.BufferedReader' object has no attribute 'page

Question

1 answers

solution1
0 2020-11-02 08:51:49

AttributeError: '_io.BufferedReader' object has no attribute 'page

Question

1 answers

solution1 0 2020-11-02 08:51:49

solution1
0 2020-11-02 08:51:49