Python code to extract txt from PDF document

Question

I have been trying to convert some PDFs into.txt, but most sample codes I found online have the same issue: They only convert one page at a time. I am kinda new to python, and I am not finding how to write a substitute for the.GetPage() method to convert the entire document at once. All help is welcomed.

import PyPDF2
 
pdfFileObject = open(r"F:\pdf.pdf", 'rb')
 
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
 
print(" No. Of Pages :", pdfReader.numPages)
 
pageObject = pdfReader.getPage(0)
 
print(pageObject.extractText())
 
pdfFileObject.close()

Answer 1

You could do this with a for loop. Extract the text from the pages in the loop and append them to a list.

import PyPDF2

pages_text=[]
with open(r"F:\pdf.pdf", 'rb') as pdfFileObject:
    pdfReader = PyPDF2.PdfFileReader(pdfFileObject)

    print(" No. Of Pages :", pdfReader.numPages)
    for page in range(pdfReader.numPages):
        pageObject = pdfReader.getPage(page)
        pages_text.append(pageObject.extractText())

print(pages_text)

Python code to extract txt from PDF document

Question

1 answers

solution1
0 2022-01-14 22:44:01

Python code to extract txt from PDF document

Question

1 answers

solution1 0 2022-01-14 22:44:01

solution1
0 2022-01-14 22:44:01