简体   繁体   中英

How to sort a list in a very specific way in Python?

How can I do a very explicit sort on a list in Python? What I mean is, items are supposed to be sorted a very specific way and not just alphabetically or numerically. The input I would be receiving looks something list this:

h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
f9j3049fj349f0j ./abcd_FF_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
jf4398fj9348fjj ./abcd_FFinit.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg

The list should be sorted:

  1. html file first
  2. ppt file second
  3. FFinit file third
  4. MMinit file fourth
  5. The rest of the numbered files in the order of FF/MM

Example output for this would look like:

h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
jf4398fj9348fjj ./abcd_FFinit.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg
f9j3049fj349f0j ./abcd_FF_000000001.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg

You need to define a key function, to guide the sorting. When comparing values to see what goes where, the result of the key function is then used instead of the values directly.

The key function can return anything, but here a tuple would be helpful. Tuples are compared lexicographically , meaning that only their first elements are compared unless they are equal, after which the second elements are used. If those are equal too, further elements are compared, until there are no more elements or an order has been determined.

For your case, you could produce a number in the first location, to order the 'special' entries, then for the remainder return the number in the second position and the FF or MM string in the last:

def key(filename):
    if filename.endswith('.html'):
        return (0,)  # html first
    if filename.endswith('.ppt'):
        return (1,)  # ppt second
    if filename.endswith('FFinit.jpg'):
        return (2,)  # FFinit third
    if filename.endswith('MMinit.jpg'):
        return (3,)  # MMinit forth
    # take last two parts between _ characters, ignoring the extension
    _, FFMM, number = filename.rpartition('.')[0].rsplit('_', 2)
    # rest is sorted on the number (compared here lexicographically) and FF/MM
    return (4, number, FFMM)

Note that the tuples don't need to be of equal length even.

This produces the expected output:

>>> from pprint import pprint
>>> lines = '''\
... h43948fh4349f84 ./.file.html
... dsfj940j90f94jf ./abcd.ppt
... f9j3049fj349f0j ./abcd_FF_000000001.jpg
... f0f9049jf043930 ./abcd_FF_000000002.jpg
... j909jdsa094jf49 ./abcd_FF_000000003.jpg
... jf4398fj9348fjj ./abcd_FFinit.jpg
... 9834jf9483fj43f ./abcd_MM_000000001.jpg
... fj09jw93fj930fj ./abcd_MM_000000002.jpg
... fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg
... vyr89r8y898r839 ./abcd_MMinit.jpg
... '''.splitlines()
>>> pprint(sorted(lines, key=key))
['h43948fh4349f84 ./.file.html',
 'dsfj940j90f94jf ./abcd.ppt',
 'jf4398fj9348fjj ./abcd_FFinit.jpg',
 'vyr89r8y898r839 ./abcd_MMinit.jpg',
 'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
 '9834jf9483fj43f ./abcd_MM_000000001.jpg',
 'f0f9049jf043930 ./abcd_FF_000000002.jpg',
 'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
 'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
 'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg']

You can use the key argument to sort() . This method of the list class accepts an element of the list and returns a value that can be compared to other return values to determine sorting order. One possibility is to assign a number to each criteria exactly as you describe in your question.

Use sorted and a custom key function.

strings = ['h43948fh4349f84 ./.file.html',
'dsfj940j90f94jf ./abcd.ppt',
'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
'f0f9049jf043930 ./abcd_FF_000000002.jpg',
'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
'jf4398fj9348fjj ./abcd_FFinit.jpg',
'9834jf9483fj43f ./abcd_MM_000000001.jpg',
'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg',
'vyr89r8y898r839 ./abcd_MMinit.jpg']

def key(string):    
    if string.endswith('html'):
        return 0,
    elif string.endswith('ppt'):
        return 1,
    elif string.endswith('FFinit.jpg'):
        return 2,
    elif string.endswith('MMinit.jpg'):
        return 3,
    elif string[-16:-14] == 'FF':
        return 4, int(string[-13:-4]), 0
    elif string[-16:-14] == 'MM':
        return 4, int(string[-13:-4]), 1

result = sorted(strings, key=key)

for string in result:
    print(string)

Out:
h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
jf4398fj9348fjj ./abcd_FFinit.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg
f9j3049fj349f0j ./abcd_FF_000000001.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg

I assumed the last ordering point just looked at the number before the file extension (eg 000001)

def custom_key(x):
    substring_order = ['.html','.ppt','FFinit','MMinit']
    other_order = lambda x: int(x.split('_')[-1].split('.')[0])+len(substring_order)
    for i,o in enumerate(substring_order):
        if o in x:
            return i
    return other_order(x)

sorted_list = sorted(data,key=custom_key)

import pprint
pprint.pprint(sorted_list)

Out:
['h43948fh4349f84 ./.file.html',
'dsfj940j90f94jf ./abcd.ppt',
'jf4398fj9348fjj ./abcd_FFinit.jpg',
'vyr89r8y898r839 ./abcd_MMinit.jpg',
'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
'9834jf9483fj43f ./abcd_MM_000000001.jpg',
'f0f9049jf043930 ./abcd_FF_000000002.jpg',
'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM