简体   繁体   English

如何在Python中以非常特定的方式对列表进行排序?

[英]How to sort a list in a very specific way in Python?

How can I do a very explicit sort on a list in Python? 如何在Python列表中进行非常明确的排序? What I mean is, items are supposed to be sorted a very specific way and not just alphabetically or numerically. 我的意思是,应该以非常特定的方式对项目进行排序,而不仅仅是字母或数字。 The input I would be receiving looks something list this: 我将收到的输入看起来如下所示:

h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
f9j3049fj349f0j ./abcd_FF_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
jf4398fj9348fjj ./abcd_FFinit.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg

The list should be sorted: 该列表应排序:

  1. html file first html文件优先
  2. ppt file second ppt文件第二
  3. FFinit file third FFinit文件第三
  4. MMinit file fourth MMinit文件第四个
  5. The rest of the numbered files in the order of FF/MM 其余编号文件按FF / MM顺序

Example output for this would look like: 输出示例如下:

h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
jf4398fj9348fjj ./abcd_FFinit.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg
f9j3049fj349f0j ./abcd_FF_000000001.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg

You need to define a key function, to guide the sorting. 您需要定义一个key功能,以指导排序。 When comparing values to see what goes where, the result of the key function is then used instead of the values directly. 比较值以查看结果时,将使用键函数的结果而不是直接使用值。

The key function can return anything, but here a tuple would be helpful. 键函数可以返回任何内容,但是在这里元组会有所帮助。 Tuples are compared lexicographically , meaning that only their first elements are compared unless they are equal, after which the second elements are used. 字典上比较元组,这意味着仅比较它们的第一个元素,除非它们相等,然后再使用第二个元素。 If those are equal too, further elements are compared, until there are no more elements or an order has been determined. 如果它们也相等,则将比较其他元素,直到没有更多元素或确定顺序为止。

For your case, you could produce a number in the first location, to order the 'special' entries, then for the remainder return the number in the second position and the FF or MM string in the last: 对于您的情况,您可以在第一个位置生成一个数字,以对“特殊”条目进行排序,然后在其余位置返回第二个位置的数字,最后一个返回FFMM字符串:

def key(filename):
    if filename.endswith('.html'):
        return (0,)  # html first
    if filename.endswith('.ppt'):
        return (1,)  # ppt second
    if filename.endswith('FFinit.jpg'):
        return (2,)  # FFinit third
    if filename.endswith('MMinit.jpg'):
        return (3,)  # MMinit forth
    # take last two parts between _ characters, ignoring the extension
    _, FFMM, number = filename.rpartition('.')[0].rsplit('_', 2)
    # rest is sorted on the number (compared here lexicographically) and FF/MM
    return (4, number, FFMM)

Note that the tuples don't need to be of equal length even. 请注意,元组的长度不必相等。

This produces the expected output: 这将产生预期的输出:

>>> from pprint import pprint
>>> lines = '''\
... h43948fh4349f84 ./.file.html
... dsfj940j90f94jf ./abcd.ppt
... f9j3049fj349f0j ./abcd_FF_000000001.jpg
... f0f9049jf043930 ./abcd_FF_000000002.jpg
... j909jdsa094jf49 ./abcd_FF_000000003.jpg
... jf4398fj9348fjj ./abcd_FFinit.jpg
... 9834jf9483fj43f ./abcd_MM_000000001.jpg
... fj09jw93fj930fj ./abcd_MM_000000002.jpg
... fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg
... vyr89r8y898r839 ./abcd_MMinit.jpg
... '''.splitlines()
>>> pprint(sorted(lines, key=key))
['h43948fh4349f84 ./.file.html',
 'dsfj940j90f94jf ./abcd.ppt',
 'jf4398fj9348fjj ./abcd_FFinit.jpg',
 'vyr89r8y898r839 ./abcd_MMinit.jpg',
 'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
 '9834jf9483fj43f ./abcd_MM_000000001.jpg',
 'f0f9049jf043930 ./abcd_FF_000000002.jpg',
 'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
 'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
 'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg']

You can use the key argument to sort() . 您可以将key参数用于sort() This method of the list class accepts an element of the list and returns a value that can be compared to other return values to determine sorting order. list类的此方法接受list的元素,并返回一个可以与其他返回值进行比较以确定排序顺序的值。 One possibility is to assign a number to each criteria exactly as you describe in your question. 一种可能性是完全按照您在问题中所描述的为每个标准分配一个数字。

Use sorted and a custom key function. 使用sorted和自定义key功能。

strings = ['h43948fh4349f84 ./.file.html',
'dsfj940j90f94jf ./abcd.ppt',
'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
'f0f9049jf043930 ./abcd_FF_000000002.jpg',
'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
'jf4398fj9348fjj ./abcd_FFinit.jpg',
'9834jf9483fj43f ./abcd_MM_000000001.jpg',
'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg',
'vyr89r8y898r839 ./abcd_MMinit.jpg']

def key(string):    
    if string.endswith('html'):
        return 0,
    elif string.endswith('ppt'):
        return 1,
    elif string.endswith('FFinit.jpg'):
        return 2,
    elif string.endswith('MMinit.jpg'):
        return 3,
    elif string[-16:-14] == 'FF':
        return 4, int(string[-13:-4]), 0
    elif string[-16:-14] == 'MM':
        return 4, int(string[-13:-4]), 1

result = sorted(strings, key=key)

for string in result:
    print(string)

Out:
h43948fh4349f84 ./.file.html
dsfj940j90f94jf ./abcd.ppt
jf4398fj9348fjj ./abcd_FFinit.jpg
vyr89r8y898r839 ./abcd_MMinit.jpg
f9j3049fj349f0j ./abcd_FF_000000001.jpg
9834jf9483fj43f ./abcd_MM_000000001.jpg
f0f9049jf043930 ./abcd_FF_000000002.jpg
fj09jw93fj930fj ./abcd_MM_000000002.jpg
j909jdsa094jf49 ./abcd_FF_000000003.jpg
fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg

I assumed the last ordering point just looked at the number before the file extension (eg 000001) 我假设最后一个订购点只是看了文件扩展名前的数字(例如000001)

def custom_key(x):
    substring_order = ['.html','.ppt','FFinit','MMinit']
    other_order = lambda x: int(x.split('_')[-1].split('.')[0])+len(substring_order)
    for i,o in enumerate(substring_order):
        if o in x:
            return i
    return other_order(x)

sorted_list = sorted(data,key=custom_key)

import pprint
pprint.pprint(sorted_list)

Out:
['h43948fh4349f84 ./.file.html',
'dsfj940j90f94jf ./abcd.ppt',
'jf4398fj9348fjj ./abcd_FFinit.jpg',
'vyr89r8y898r839 ./abcd_MMinit.jpg',
'f9j3049fj349f0j ./abcd_FF_000000001.jpg',
'9834jf9483fj43f ./abcd_MM_000000001.jpg',
'f0f9049jf043930 ./abcd_FF_000000002.jpg',
'fj09jw93fj930fj ./abcd_MM_000000002.jpg',
'j909jdsa094jf49 ./abcd_FF_000000003.jpg',
'fjdsjfd89s8hs9h ./abcd_MM_000000003.jpg']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM