Identify a group of file and process based on a pattern : Python

Question

My requirement is that , if I found a particular pattern in the file name ,then I need to delete the corresponding group of files which belong to that group . For example , below is the group of the files the I have :

file1.infile_inprogress_2015033
file1.infile_rsn_20150330022431
file1.infile_err_20150330022431
file2.infile_03_29_2015_05:08:46
file2.infile_03_29_2015_05:09:56
file3.infile_20150330023214

The pattern I need to search in a file name is : "inprogress" . Hence in the upper list , I will need to delete the following files :

file1.infile_inprogress_2015033
file1.infile_rsn_20150330022431
file1.infile_err_20150330022431

Because the upper list has the same file name ( "file1" ) before the identifier "infile" .

As of now , I could only list the files :

 filelist = (glob.glob('C:\\CIRP\\Velocidata\\Test\\*'))
 for file in filelist:
  filenamecopied = os.path.basename(file)
  if fnmatch.fnmatch(filenamecopied,"*Inprogress*"):
   print ('Delete the group of files ')
  else:
   print ('skip this file')

Answer 1

OS walk is a better bet (easier to read) then filter on the file name.

import os
top = 'C:\\CIRP\\Velocidata\\Test\\'

# Getting the list of all files
for root, dirs, files in os.walk(top):

    # Filtering for group names that are 'Inprogress'
    groups_in_progress = []
    for name in files:
        if 'Inprogress' in name:
            group = name[0:name.lower().find('infile')]
            groups_in_progress.append(group.lower())

    # Delete the files where a group is in progress
    for name in files:
        for group in groups_in_progress:
            if name.lower().startswith(group):
                os.remove(os.path.join(root, name))

You can use dictionaries and all kinds of optimizations but this is the most straight forward.

Answer 2

You need os.unlink . From the docs, os.unlink is used to

Remove ( delete ) the file path.

Add a few lines in your if clause as

# This if will check for "InProgress"
if fnmatch.fnmatch(filenamecopied,"*Inprogress*"):
    filegroup = filenamecopied.split('.')[0]   # get the file group                                                   
    for i in filelist:                         # Iterate through the files
        # This if will check for "file1" or "file2" etc
        if (i.startswith(filegroup)):          # if i is of same group
             os.unlink(i)                      # Delete it

Answer 3

A few questions:

Are they always in order like you've listed them, or might they pop up in different orders?
Do they have any regular format features (like filexxx. at the front)?
Does the "inprogress" parts always come up before the other files?

If I assume that the file name format is a bunch of letters or numbers then a "." and then a bunch more characters, and that they come up in random order I would do it something like this:

Go through creating a list of files prefixes that are going to be deleted.
Go through again, deleting files that are in the prefix.

Kind of like this:

filelist = (glob.glob('C:\\CIRP\\Velocidata\\Test\\*'))
deleteList = set()
for f in filelist:
    if "inprogress" in f.lower():     #Checks if inprogress is in the filename
        deleteList.add(f[:f.find(".")])  #Adds base of filename
print deleteList
for f in filelist:
    if f[:f.find(".")] in deleteList:
        print "Delete:",f
    else:
        print "Do not delete:",f

I haven't done the actual delete code, but you can check whether this is catching everything for you. I used simple string functions, rather than a re to catch the file names based on what you said. If not, please post back with answers to above questions!

Identify a group of file and process based on a pattern : Python

Question

3 answers

solution1
3 ACCPTED 2015-03-30 20:42:06

solution2
2 2015-03-30 20:33:34

solution3
1 2015-03-30 21:03:19

Identify a group of file and process based on a pattern : Python

Question

3 answers

solution1 3 ACCPTED 2015-03-30 20:42:06

solution2 2 2015-03-30 20:33:34

solution3 1 2015-03-30 21:03:19

solution1
3 ACCPTED 2015-03-30 20:42:06

solution2
2 2015-03-30 20:33:34

solution3
1 2015-03-30 21:03:19