`I am trying to extract text from pdf file which consists of text, tables, and images. and want to save that file on local system. This was the code i was developing.
from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("SHKelkar.pdf", "rb") as pdf_file:
pdf_reader = PdfFileReader(pdf_file)
total_pages = pdf_reader.numPages
print(total_pages)
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
for i in range(total_pages):
page = pdf_file.page[i]
textdata = page.extract_text()
print(textdata)
you are extracting from pdf_file
instead of pdf_reader
:
check this below working code.
from PyPDF2 import PdfFileReader
# Load the pdf to the PdfFileReader object with default settings
with open("sample.pdf", "rb") as pdf_file:
pdf_reader = PdfFileReader(pdf_file)
total_pages = pdf_reader.getNumPages()
print(total_pages)
print(f"The total number of pages in the pdf document is {pdf_reader.numPages}")
for i in range(total_pages):
page = pdf_reader.getPage(i)
textdata = page.extractText()
print(textdata)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.