Python腳本以遍歷目錄中的PDF並找到匹配的行

Question

目前，我將所有報告通過pdf電子郵件發送給我。 我所做的是將Outlook設置為每天自動將那些文件下載到某個目錄。 有時，這些pdf中沒有任何數據，而僅包含“沒有符合選擇標准的數據”行。 我想創建一個python程序，該程序遍歷該目錄中的每個pdf文件，打開它並查找那些單詞，如果它們包含該短語，則刪除該特定的pdf。 如果他們不這樣做，那么什么也不做。 通過reddit的幫助，我整理了以下代碼：

import PyPDF2
import os

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\'
for file in os.listdir(directory):
    if not file.endswith(".pdf"):
        continue
    with open("{}/{}".format(directory,file), 'rb') as pdfFileObj:
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        pageObj = pdfReader.getPage(0)
        if "There is no data to present that matches the selection criteria" in pageObj.extractText():
            print("{} was removed.".format(file))
            os.remove(file)

我已經測試了3個文件，其中一個包含匹配短語。 無論文件如何命名或以什么順序失敗。 我已經在名為3.pdf的目錄中的一個文件中對其進行了測試。 下面是錯誤代碼被獲取。

FileNotFoundError：[WinError 2]系統找不到指定的文件：>'3.pdf'

這將大大減少我的工作量，並且對我來說是一個很好的學習實例。 歡迎所有幫助/批評。

Answer 1

見下文：

import PyPDF2
import os

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\'
for file in os.listdir(directory):
    if not file.endswith(".pdf"):
        continue
    with open(os.path.join(directory,file), 'rb') as pdfFileObj:  # Changes here
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        pageObj = pdfReader.getPage(0)
        if "There is no data to present that matches the selection criteria" in pageObj.extractText():
            print("{} was removed.".format(file))
            os.remove(file)

Python腳本以遍歷目錄中的PDF並找到匹配的行

問題描述

1 個解決方案

解決方案1
1 2017-06-14 20:04:53

Python腳本以遍歷目錄中的PDF並找到匹配的行

問題描述

1 個解決方案

解決方案1 1 2017-06-14 20:04:53

解決方案1
1 2017-06-14 20:04:53