Pypdf2合并function并以

Question

第一次在这里编码。 我正在尝试使用 python 创建一个程序来帮助自动化我在办公室的一些工作。

我要做的是将文件夹 1 中的 pdf 文件与文件夹 2 中的另一个同名 pdf 文件合并。 我也想使用 Tkinter gui

这就是我到目前为止所得到的

from tkinter import *
from PyPDF2 import PdfFileMerger

root = Tk()

  
# Creating a Label Widget
MainLabel = Label(root, text="PDF Rawat Jalan")
# Shoving it onto the screen
MainLabel.pack()

#Prompt Kode
KodeLabel = Label(root, text="Masukan Kode")
KodeLabel.pack()

#Input Kode

kode = Entry(root, bg="gray",)
kode.pack()


#function of Merge Button
def mergerclick():
    kode1 = kode.get()
    pdflocation_1 = "C:\\Users\\User\\Desktop\\PDF\\Folder 1\\1_"+kode1+".pdf"
    pdflocation_2 = "C:\\Users\\User\\Desktop\\PDF\\Folder 2\\2_"+kode1+".pdf"
    Output = "C:\\Users\\User\\Desktop\\PDF\\output\\"+kode1+".pdf"
    merger = PdfFileMerger()

    merger.append(pdflocation_1)
    merger.append(pdflocation_2)

    merger.write(open(Output, 'wb'))
    confirmation = kode1 +" merged"
    testlabel = Label(root, text=confirmation)
    testlabel.pack()



#Merge Button
mergerButton = Button(root, text= "Merge", command=mergerclick)
mergerButton.pack()

root.mainloop()

现在有第三个文件我应该是 append，但我应该是 append 的第三个文件的文件名中有日期。 例如：文件1（010.pdf）； 文件 2 (010.pdf); 文件 3 (010_2020_10_05)。

每个文件夹大约有 9000 个文件我应该怎么做？

Answer 1

我认为您需要的是一种仅查找以特定字符串为前缀的文件的方法。 根据日期后缀，我猜文件名可能不是唯一的，所以我写这个来查找所有匹配项。 这样的事情会做到这一点：

import pathlib

def find_prefix_matches(prefix):
  dir_path = pathlib.Path(directory_name)
  return [str(f_name) for f_name in dir_path.iterdir() 
      if str(f_name).startswith(prefix)]

如果你只是学习写代码，这个例子比较简单。 但是，如果您需要同时匹配 9,000 个文件，则效率不高。 为了让它运行得更快，您需要加载文件列表一次，而不是每次请求。

import pathlib

def find_prefix_matches(prefix, file_list):
  return [f for f in file_list if f.startswith(prefix)]

file_list = [str(f_name) for f_name in dir_path.iterdir()]
for file_name_prefix in your_list_of_files_to_append:
  file_matches = find_prefix_matches(file_name_prefix, file_list)

Pypdf2合并function并以

问题描述

1 个解决方案

解决方案1
0 2020-12-06 16:59:00

Pypdf2合并function并以

问题描述

1 个解决方案

解决方案1 0 2020-12-06 16:59:00

解决方案1
0 2020-12-06 16:59:00