Python脚本以遍历目录中的PDF并找到匹配的行

Question

目前，我将所有报告通过pdf电子邮件发送给我。 我所做的是将Outlook设置为每天自动将那些文件下载到某个目录。 有时，这些pdf中没有任何数据，而仅包含“没有符合选择标准的数据”行。 我想创建一个python程序，该程序遍历该目录中的每个pdf文件，打开它并查找那些单词，如果它们包含该短语，则删除该特定的pdf。 如果他们不这样做，那么什么也不做。 通过reddit的帮助，我整理了以下代码：

import PyPDF2
import os

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\'
for file in os.listdir(directory):
    if not file.endswith(".pdf"):
        continue
    with open("{}/{}".format(directory,file), 'rb') as pdfFileObj:
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        pageObj = pdfReader.getPage(0)
        if "There is no data to present that matches the selection criteria" in pageObj.extractText():
            print("{} was removed.".format(file))
            os.remove(file)

我已经测试了3个文件，其中一个包含匹配短语。 无论文件如何命名或以什么顺序失败。 我已经在名为3.pdf的目录中的一个文件中对其进行了测试。 下面是错误代码被获取。

FileNotFoundError：[WinError 2]系统找不到指定的文件：>'3.pdf'

这将大大减少我的工作量，并且对我来说是一个很好的学习实例。 欢迎所有帮助/批评。

Answer 1

见下文：

import PyPDF2
import os

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\'
for file in os.listdir(directory):
    if not file.endswith(".pdf"):
        continue
    with open(os.path.join(directory,file), 'rb') as pdfFileObj:  # Changes here
        pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
        pageObj = pdfReader.getPage(0)
        if "There is no data to present that matches the selection criteria" in pageObj.extractText():
            print("{} was removed.".format(file))
            os.remove(file)

Python脚本以遍历目录中的PDF并找到匹配的行

问题描述

1 个解决方案

解决方案1
1 2017-06-14 20:04:53

Python脚本以遍历目录中的PDF并找到匹配的行

问题描述

1 个解决方案

解决方案1 1 2017-06-14 20:04:53

解决方案1
1 2017-06-14 20:04:53