简体   繁体   English

将多个 pdf 文件中的特定页面写入新的 pdf 文件

[英]Write specific pages from multiple pdf files to a new pdf file

I have multiple pdf files that I want to extract a group of specific pages from where each set of pages is different for each pdf file.我有多个 pdf 文件,我想从中提取一组特定页面,其中每个 pdf 文件的每组页面都不同。 I have created a dictionary with the keys as the pdf file name and the values as the list of pages to be extracted from each pdf file (shown as key).我创建了一个字典,其中键作为 pdf 文件名,值作为要从每个 pdf 文件中提取的页面列表(显示为键)。 I intend to extract the given pages from the associated pdf file and write them all to one new pdf file so that I can do data extraction on this final file.我打算从相关的 pdf 文件中提取给定的页面并将它们全部写入一个新的 pdf 文件,以便我可以对这个最终文件进行数据提取。 I have tried PyPDF4 as well as FPDF but no joy as yet as it gives me either a large pdf with blank pages or a pdf with just 1 or 2 pages extracted or error that the pdf object cannot be found.我已经尝试过 PyPDF4 和 FPDF,但还没有任何乐趣,因为它给了我一个带有空白页的大 pdf 或一个仅提取 1 或 2 页的 pdf,或者找不到 pdf 对象的错误。 I am hoping to get some guidance on where I am going wrong with my approach.我希望能就我的方法出错的地方得到一些指导。 Below is my code:下面是我的代码:

import PyPDF4
from PyPDF4 import PdfFileReader, PdfFileWriter

for pdf,pgs in dic_11_1.items():
  pdf=list(dic_11_1.keys())
  pgs=list(dic_11_1.values())
  for i in range(0,len(pdf)):
    pages = pgs[i]
    object = open(pdf[i],'rb') 
    pdfinput=PyPDF4.PdfFileReader(object,'rb')
    if pdfinput.isEncrypted:
        pdfinput.decrypt('')
    else:
        pdfinput
    for p in pages:
        page=pdfinput.getPage(p)
        pdf_writer=PyPDF4.PdfFileWriter()
        pdf_writer.addPage(page)
        with open('F111.pdf',mode='wb') as output:
            pdf_writer.write(output)

Well, the problem is you're not iterating properly.好吧,问题是您没有正确迭代。 See comments in code for better understanding.请参阅代码中的注释以更好地理解。

from PyPDF4 import PdfFileReader, PdfFileWriter

dic_11_1 = {
    'filename1.pdf': [1, 3, 4],
    'filename2.pdf': [0, 2, 3],
}
# note pages are zero-numbered

# you need single writer for all files, don't declare it in a loop
pdf_writer = PdfFileWriter()

for filename, pages in dic_11_1.items():
    # now you have filename and pages set to 'filename1.pdf' and [1, 3, 4]
    # on second iteration they'll be set to 'filename2.pdf' and [0, 2, 3]
    # ...

    # don't use `object` as variable name: it's valid, but bad style
    # (it shadows builtin `object`)
    # context manager will close the file automatically
    with open(filename,'rb') as src:
        pdfinput = PdfFileReader(src, 'rb')

        if pdfinput.isEncrypted:
            pdfinput.decrypt('')
            # you don't need empty `else`

        for p in pages:
            # you might want to use `p - 1` instead if your input was 1-numbered
            page = pdfinput.getPage(p)
            pdf_writer.addPage(page)

# when all pages are added, write to output
with open('F111.pdf',mode='wb') as output:
    pdf_writer.write(output)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM