简体   繁体   中英

Grouping files by their file name and get the latest files based on the last modified

I am currently processing a list of files in multiple sub-directories. I have managed to do that, but the problem is that I have multiple files stored in multiple sub-directories with same file name. So, I have to group those files and get the last modified ones only.

files = ["Z:\\RM Submissions\\01_RM Submission_Archive\\sample_file.csv","Z:\\RM Submissions\\Final Submissions\\sample_file.csv","Z:\\RM Submissions\\01_RM Submission_Archive\\sample_file1.csv"]

For instance, based on the sample data I have above, sample_file.csv is located in two different sub-directories, but I only want to get the latest one.

The following code should work for files in same directory with same file names. But doesn't work as I have the files stored in multiple sub-directories.

  for k, g in itertools.groupby(os.path.basename(files), lambda f: os.path.splitext(f)[0]):
         dups = list(g)
         if len(dups) > 1:
            #get the last modified ones

So how can I group the files by file names and find the latest ones only please?

You could group the files with a dictionary, with basenames as keys and the corresponding files collected in a list:

import os

files_dct = {}
for file in files:
    files_dct.setdefault(os.path.basename(file), []).append(file)

for basename, filegroup in files_dct.items():
    if len(filegroup) > 1:
        file = max(filegroup, key=lambda f: os.stat(f).st_mtime_ns)
    else:
        file = filegroup[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM