简体   繁体   中英

AttributeError: '_io.BufferedReader' object has no attribute 'page

`I am trying to extract text from pdf file which consists of text, tables, and images. and want to save that file on local system. This was the code i was developing.

from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("SHKelkar.pdf", "rb") as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    total_pages = pdf_reader.numPages
    print(total_pages)
    print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
    for i in range(total_pages):
        page = pdf_file.page[i]
        textdata = page.extract_text()
        print(textdata)

you are extracting from pdf_file instead of pdf_reader :

check this below working code.

from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("sample.pdf", "rb") as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    total_pages = pdf_reader.getNumPages()
    print(total_pages)
    print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
    for i in range(total_pages):
        page = pdf_reader.getPage(i)
        textdata = page.extractText()
        print(textdata)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM