简体   繁体   中英

Identify a group of file and process based on a pattern : Python

My requirement is that , if I found a particular pattern in the file name ,then I need to delete the corresponding group of files which belong to that group . For example , below is the group of the files the I have :

file1.infile_inprogress_2015033
file1.infile_rsn_20150330022431
file1.infile_err_20150330022431
file2.infile_03_29_2015_05:08:46
file2.infile_03_29_2015_05:09:56
file3.infile_20150330023214

The pattern I need to search in a file name is : "inprogress" . Hence in the upper list , I will need to delete the following files :

file1.infile_inprogress_2015033
file1.infile_rsn_20150330022431
file1.infile_err_20150330022431

Because the upper list has the same file name ( "file1" ) before the identifier "infile" .

As of now , I could only list the files :

 filelist = (glob.glob('C:\\CIRP\\Velocidata\\Test\\*'))
 for file in filelist:
  filenamecopied = os.path.basename(file)
  if fnmatch.fnmatch(filenamecopied,"*Inprogress*"):
   print ('Delete the group of files ')
  else:
   print ('skip this file')

OS walk is a better bet (easier to read) then filter on the file name.

import os
top = 'C:\\CIRP\\Velocidata\\Test\\'

# Getting the list of all files
for root, dirs, files in os.walk(top):

    # Filtering for group names that are 'Inprogress'
    groups_in_progress = []
    for name in files:
        if 'Inprogress' in name:
            group = name[0:name.lower().find('infile')]
            groups_in_progress.append(group.lower())

    # Delete the files where a group is in progress
    for name in files:
        for group in groups_in_progress:
            if name.lower().startswith(group):
                os.remove(os.path.join(root, name))

You can use dictionaries and all kinds of optimizations but this is the most straight forward.

You need os.unlink . From the docs, os.unlink is used to

Remove ( delete ) the file path.

Add a few lines in your if clause as

# This if will check for "InProgress"
if fnmatch.fnmatch(filenamecopied,"*Inprogress*"):
    filegroup = filenamecopied.split('.')[0]   # get the file group                                                   
    for i in filelist:                         # Iterate through the files
        # This if will check for "file1" or "file2" etc
        if (i.startswith(filegroup)):          # if i is of same group
             os.unlink(i)                      # Delete it

A few questions:

  1. Are they always in order like you've listed them, or might they pop up in different orders?
  2. Do they have any regular format features (like filexxx. at the front)?
  3. Does the "inprogress" parts always come up before the other files?

If I assume that the file name format is a bunch of letters or numbers then a "." and then a bunch more characters, and that they come up in random order I would do it something like this:

  1. Go through creating a list of files prefixes that are going to be deleted.
  2. Go through again, deleting files that are in the prefix.

Kind of like this:

filelist = (glob.glob('C:\\CIRP\\Velocidata\\Test\\*'))
deleteList = set()
for f in filelist:
    if "inprogress" in f.lower():     #Checks if inprogress is in the filename
        deleteList.add(f[:f.find(".")])  #Adds base of filename
print deleteList
for f in filelist:
    if f[:f.find(".")] in deleteList:
        print "Delete:",f
    else:
        print "Do not delete:",f

I haven't done the actual delete code, but you can check whether this is catching everything for you. I used simple string functions, rather than a re to catch the file names based on what you said. If not, please post back with answers to above questions!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM