I have been trying to convert some PDFs into.txt, but most sample codes I found online have the same issue: They only convert one page at a time. I am kinda new to python, and I am not finding how to write a substitute for the.GetPage() method to convert the entire document at once. All help is welcomed.
import PyPDF2
pdfFileObject = open(r"F:\pdf.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
print(" No. Of Pages :", pdfReader.numPages)
pageObject = pdfReader.getPage(0)
print(pageObject.extractText())
pdfFileObject.close()
You could do this with a for
loop. Extract the text from the pages in the loop and append them to a list.
import PyPDF2
pages_text=[]
with open(r"F:\pdf.pdf", 'rb') as pdfFileObject:
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
print(" No. Of Pages :", pdfReader.numPages)
for page in range(pdfReader.numPages):
pageObject = pdfReader.getPage(page)
pages_text.append(pageObject.extractText())
print(pages_text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.