简体   繁体   中英

Python/Itertools: Get latest file by name

I have a list of filenames in a directory and I'd like to keep only the latest versions. The list looks like:

['file1-v1.csv', 'file1-v2.csv', 'file2-v1.txt', ...] .

I'd like to only keep the newest csv files as per the version (part after - in the filename) and the txt files.

The output would be [''file1-v2.csv', 'file2-v1.txt', ...]

I have a solution that requires the use of sets but I'm looking for a easy pythonic way to do this. Potentially using itertools and groupby

Update: Solution so far

I've been able to do some preliminary work to get a list like

lst = [('file1', 'csv', 'v1','<some data>'), ('file2', 'csv', 'v2','<some data>'), ...]

I'd like to group by elements at index 0 and 1 but provide only the tuple with the maximum index 2 .

It may be something like the below:

files = list(item for key, group in itertools.groupby(files, lambda x: x[0:2]) for item in group)
# Maximum over 3rd index element in each tuple does not work
files = max(files, key=operator.itemgetter(2))

Also, I feel like the below should work but it does not select the maximum properly

[max(items, key=operator.itemgetter(2)) for key, items in itertools.groupby(files, key=operator.itemgetter(0, 1))]

I'd do it like this:

import os
import itertools

filenames = ['file1-v1.csv', 'file1-v2.csv', 'file1-v3.jpg', 'file2-v1.txt']


def split_filename(filename):
    basename, ext = os.path.splitext(filename)
    root, version = basename.rsplit('-v', 1)

    return root, ext, int(version)

def filter_latest_versions(filenames):
    parsed_filenames = sorted(map(split_filename, filenames))

    for _, matches in itertools.groupby(parsed_filenames, key=lambda f: f[:2]):
        root, ext, version = tuple(matches)[-1]

        yield '{}-v{}{}'.format(root, version, ext)

It doesn't differ a whole lot from your now-posted solution, but it does properly sort out different extensions and handle filenames with dashes in the name.

You can try this:

a = ['file1-v1.csv', 'file1-v2.csv', 'file2-v1.txt','file4-v1.csv','file2-v2.txt','file2-v3.txt']
d = {}
for i in a:
    x = i.split("-")
    d[x[0]]= x[1]
    if x[0] in d:
        d[x[0]] = x[1]
    else:
        d[x[0]] = x[1] 

for x,y in d.items():
    print('-'.join((x,y)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM