Deleting pdf files from a folder if the search word is present using python

Question

Hi i am trying to delete the pdf files in a folder which contains the word "Publications périodiques" in the first, so far i am able to search for the word but dont know how to delete the files.

Code used to search for the word in pdf files

import PyPDF2
import re
object = PyPDF2.PdfFileReader("202105192101394-60.pdf")
String = "Publications périodiques"
for i in range(0, NumPages):
    PageObj = object.getPage(i)
    print("this is page " + str(i)) 
    Text = PageObj.extractText() 
    # print(Text)
    ResSearch = re.search(String, Text)
    print(ResSearch)

Also how to loop this in multiple files

Answer 1

You can delete any file using:

import os
os.remove("C://fake/path/to/file.pdf")

Answer 2

In order to delete a file use

import os
os.unlink(file_path)

where file_path is the path to the relevant file

Answer 3

For browsing through files:

from os import walk
mypath= "./"
_, _, filenames = next(walk(mypath))

Process each file:

for file in filenames:
    foundWord = yourFunction(file)
    if foundWord:
        os.remove(file) # Delete the file

Write yourFunction() such that it returns true/false.

Answer 4

I suppose your re.search() is already functional? Or is that part of your question?

If functional, you could just use os to get all the files, perhaps filter them through a list comprehension to only get the pdf-files like so:

import os

all_files = os.listdir("C:/../or_whatever_path")
only_pdf_files = [file for file in all_files if ".pdf" in file]

from that point on, you can iterate through all pdf-files and just execute the same code you've already written for each one and when "ResSearch" is True, delete the File via os.remove() method:

for file in only_pdf_files:
   object = PyPDF2.PdfFileReader(file)
   String = "Publications périodiques"
   for i in range(0, NumPages):
      PageObj = object.getPage(i)
      print("this is page " + str(i))
      Text = PageObj.extractText()
      # print(Text)
      ResSearch = re.search(String, Text)
      if ResSearch:
         os.remove(file)
      else:
         pass

EDIT:

When your pdf-files aren't in the same directory as your python script, the path is to be added to the os.remove() method.

Answer 5

for file in only_pdf_files:
    object = PyPDF2.PdfFileReader(file)
    NumPages = object.getNumPages()
    String = "Publications périodiques"
    for i in range(0, NumPages):
        PageObj = object.getPage(i)
        Text = PageObj.extractText()
        
      # print(Text)
        ResSearch = re.search(String, Text)
        if ResSearch:
            os.remove(file)
        else:
            pass

Deleting pdf files from a folder if the search word is present using python

Question

5 answers

solution1
1 2021-05-20 12:49:45

solution2
0 2021-05-20 12:45:21

solution3
0 2021-05-20 12:48:55

solution4
0 2021-05-20 12:53:28

solution5
-1 2021-05-21 12:50:25

Deleting pdf files from a folder if the search word is present using python

Question

5 answers

solution1 1 2021-05-20 12:49:45

solution2 0 2021-05-20 12:45:21

solution3 0 2021-05-20 12:48:55

solution4 0 2021-05-20 12:53:28

solution5 -1 2021-05-21 12:50:25

solution1
1 2021-05-20 12:49:45

solution2
0 2021-05-20 12:45:21

solution3
0 2021-05-20 12:48:55

solution4
0 2021-05-20 12:53:28

solution5
-1 2021-05-21 12:50:25