簡體   English   中英

PyPDF2:復制PDF會出現空白頁

[英]PyPDF2: duplicating PDF gives blank pages

我正在使用PyPDF2更改PDF文檔(添加書簽)。 因此,我需要閱讀整個源PDF並將其寫出,並保持盡可能多的數據不變。 僅將每個頁面寫入新的PDF對象可能不足以保留文檔元數據。

PdfFileWriter()確實具有許多用於復制整個文件的方法: cloneDocumentFromReaderappendPagesFromReadercloneReaderDocumentRoot 但是,它們都有問題。

如果我使用cloneDocumentFromReaderappendPagesFromReaderappendPagesFromReader得到一個有效的PDF文件,該文件具有正確的頁數,但所有頁面均為空白。

如果使用cloneReaderDocumentRootcloneReaderDocumentRoot獲得最小的有效PDF文件,但沒有頁面或數據。

之前曾有人問過這個問題 ,但沒有成功的答案。 其他問題詢問有關PyPDF2中的空白頁 ,但我無法應用給出的答案。

這是我的代碼:

def bookmark(incomingFile):
    fileObj = open(incomingFile, 'rb')
    output = PdfFileWriter()
    input = PdfFileReader(fileObj)

    output.appendPagesFromReader(input)
    #output.cloneDocumentFromReader(input)
    myTableOfContents = [
            ('Page 1', 0), 
            ('Page 2', 1),
            ('Page 3', 2)
            ]
    # output.addBookmark(title, pagenum, parent=None, color=None, bold=False, italic=False, fit='/Fit')
    for title, pagenum in myTableOfContents:
        output.addBookmark(title, pagenum, parent=None)

    output.setPageMode("/UseOutlines")

    outputStream = open(incomingFile, "wb")
    output.write(outputStream)
    outputStream.close()
    fileObj.close()

當PyPDF2無法將書簽添加到PdfF​​ileWriter對象時,因為它沒有任何頁面或類似頁面,我傾向於出錯。

我也為此付出了很多努力,最后發現PyPDF2存在此問題 基本上,我將這個答案的代碼復制到了cloneDocumentFromReader函數的第382行的C:\\ProgramData\\Anaconda3\\lib\\site-packages\\PyPDF2\\pdf.py (這取決於您的分布)中。

之后,我可以使用writer.cloneDocumentFromReader(pdf)reader頁面附加到writer ,並且在我的情況下,可以更新PDF元數據(主題,關鍵字等)。

希望這對您有幫助

    '''
    Create a copy (clone) of a document from a PDF file reader

    :param reader: PDF file reader instance from which the clone
        should be created.
    :callback after_page_append (function): Callback function that is invoked after
        each page is appended to the writer. Signature includes a reference to the
        appended page (delegates to appendPagesFromReader). Callback signature:

        :param writer_pageref (PDF page reference): Reference to the page just
            appended to the document.
    '''
    debug = False
    if debug:
        print("Number of Objects: %d" % len(self._objects))
        for obj in self._objects:
            print("\tObject is %r" % obj)
            if hasattr(obj, "indirectRef") and obj.indirectRef != None:
                print("\t\tObject's reference is %r %r, at PDF %r" % (obj.indirectRef.idnum, obj.indirectRef.generation, obj.indirectRef.pdf))

    # Variables used for after cloning the root to
    # improve pre- and post- cloning experience

    mustAddTogether = False
    newInfoRef = self._info
    oldPagesRef = self._pages
    oldPages = self.getObject(self._pages)

    # If there have already been any number of pages added

    if oldPages[NameObject("/Count")] > 0:

        # Keep them

        mustAddTogether = True
    else:

        # Through the page object out

        if oldPages in self._objects:
            newInfoRef = self._pages
            self._objects.remove(oldPages)

    # Clone the reader's root document

    self.cloneReaderDocumentRoot(reader)
    if not self._root:
        self._root = self._addObject(self._root_object)

    # Sweep for all indirect references

    externalReferenceMap = {}
    self.stack = []
    newRootRef = self._sweepIndirectReferences(externalReferenceMap, self._root)

    # Delete the stack to reset

    del self.stack

    #Clean-Up Time!!!

    # Get the new root of the PDF

    realRoot = self.getObject(newRootRef)

    # Get the new pages tree root and its ID Number

    tmpPages = realRoot[NameObject("/Pages")]
    newIdNumForPages = 1 + self._objects.index(tmpPages)

    # Make an IndirectObject just for the new Pages

    self._pages = IndirectObject(newIdNumForPages, 0, self)

    # If there are any pages to add back in

    if mustAddTogether:

        # Set the new page's root's parent to the old
        # page's root's reference

        tmpPages[NameObject("/Parent")] = oldPagesRef

        # Add the reference to the new page's root in
        # the old page's kids array

        newPagesRef = self._pages
        oldPages[NameObject("/Kids")].append(newPagesRef)

        # Set all references to the root of the old/new
        # page's root

        self._pages = oldPagesRef
        realRoot[NameObject("/Pages")] = oldPagesRef

        # Update the count attribute of the page's root

        oldPages[NameObject("/Count")] = NumberObject(oldPages[NameObject("/Count")] + tmpPages[NameObject("/Count")])

    else:

        # Bump up the info's reference b/c the old
        # page's tree was bumped off

        self._info = newInfoRef

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM