简体   繁体   中英

split a multiple page pdf with multiple pages to individual pdfs based on a string and save using that string python

I have a single invoices pdf with multiple invoices inside it, the pdf is organized in such a way that some page has invoice number and that invoice detail continues to a second or third page. What I want to do is split the pdf into individual pdf files based on the invoice number, for example, the total number of pages = 10.

page 1: invoice 1 continued to page 2 page 3: invoice 2 continued to page 4 page 5: invoice 3 continued to page 6 page 7: invoice 4 continued to page 8 page 9: invoice 5 continued to page 10

i want to split if the page contains the word invoice then split it with pages before the next invoice word, for the output i am looking for is: invoice 1.pdf (2 pages page 1 to 2) invoice 2.pdf (2 pages page 3 to 4) invoice 3.pdf (2 pages page 5 to 6) invoice 4.pdf (2 pages page 7 to 8) invoice 5.pdf (2 pages page 9 to 10)

I got the following code online for splitting pdf into individual files, can anyone help to extend this to include the above split logic?

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("invoices.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)```

I had to get an application to do this, Its called PDF Content Split SA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM