如何在没有附加先前输入的情况下合并和关闭 PyPDF

Question

Im having an issue when merging multiple pdf's because i do have to loop between folders and merge the two files that match.我在合并多个 pdf 时遇到问题，因为我必须在文件夹之间循环并合并匹配的两个文件。 this was easily done but when i:这很容易完成，但是当我：

input1.append(file1)
input2.append(file2)
PDFFileMerger.write(output)

the merging occurs, but the next iteration includes the previous inputs and so on, making the last one a huge pdf file que the occurrences repeating on each other合并发生，但下一次迭代包括以前的输入等等，使最后一个成为一个巨大的 pdf 文件，因为这些事件相互重复

for i in range(nPdfs):
    abr = onlypdf[i]
    abr = abr.replace('.pdf', '')
    for j in range(nXl):
        pdf_file = open('SEPTIEMBRE DE 2020/' + onlyfiles[j], 'rb')
        read_pdf = pdf.PdfFileReader(pdf_file)
        number_of_pages = read_pdf.getNumPages()
        page = read_pdf.getPage(0)
        page_content = page.extractText()
        if abr in page_content:
            file1 = onlypdf[i]
            file2 = onlyfiles[j]
    print(file1)
    print(file2)
    print(file1+' esta en '+file2)
    input1 = open('Combinadora/documentos/'+file1, 'rb')
    input2 = open('SEPTIEMBRE DE 2020/'+file2, 'rb')
    merger.append(input1)
    merger.append(input2)
    input1.close()
    input2.close()
    print('archivo creado')
    output = open(abr+'-'+file2, 'wb')
    merger.write(output)
    output.close()

This is my code, am i screwing it in the loop?这是我的代码，我在循环中搞砸了吗？

Answer 1

PyPDF is a great library but I had some problems too with memory. So generally I used separate processes creating the merger (killed after job) or you can delete (del) the actual object. Keep in mind that even if you find a tricky way to surpass this problem, memory leaks can happen so I strongly suggest creation and killing of processes. PyPDF 是一个很棒的库，但我在使用 memory 时也遇到了一些问题。所以通常我使用单独的进程创建合并（在工作后被杀死）或者你可以删除（删除）实际的 object。请记住，即使你找到了一个棘手的方法为了解决这个问题，可能会发生 memory 次泄漏，所以我强烈建议创建和终止进程。

如何在没有附加先前输入的情况下合并和关闭 PyPDF

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-24 03:10:41

如何在没有附加先前输入的情况下合并和关闭 PyPDF

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-24 03:10:41

解决方案1
0 已采纳 2020-10-24 03:10:41