I am currently processing a list of files in multiple sub-directories. I have managed to do that, but the problem is that I have multiple files stored in multiple sub-directories with same file name. So, I have to group those files and get the last modified ones only.
files = ["Z:\\RM Submissions\\01_RM Submission_Archive\\sample_file.csv","Z:\\RM Submissions\\Final Submissions\\sample_file.csv","Z:\\RM Submissions\\01_RM Submission_Archive\\sample_file1.csv"]
For instance, based on the sample data I have above, sample_file.csv
is located in two different sub-directories, but I only want to get the latest one.
The following code should work for files in same directory with same file names. But doesn't work as I have the files stored in multiple sub-directories.
for k, g in itertools.groupby(os.path.basename(files), lambda f: os.path.splitext(f)[0]):
dups = list(g)
if len(dups) > 1:
#get the last modified ones
So how can I group the files by file names and find the latest ones only please?
You could group the files with a dictionary, with basenames as keys and the corresponding files collected in a list:
import os
files_dct = {}
for file in files:
files_dct.setdefault(os.path.basename(file), []).append(file)
for basename, filegroup in files_dct.items():
if len(filegroup) > 1:
file = max(filegroup, key=lambda f: os.stat(f).st_mtime_ns)
else:
file = filegroup[0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.