简体   繁体   中英

How to sort a python list of strings based on a substring

I am trying to sort a python list using sorted method as per the code below. However the sorting is not happening properly.

#sort using the number part of the string
mylist = ['XYZ-78.txt', 'XYZ-8.txt', 'XYZ-18.txt'] 
def func(elem):
    return elem.split('-')[1].split('.')[0]

sortlist = sorted(mylist,key=func)
for i in sortlist:
  print(i)

The output is-
XYZ-18.txt
XYZ-78.txt
XYZ-8.txt

I was expecting output as- 
XYZ-8.txt
XYZ-18.txt
XYZ-78.txt

you should transform the numbers in Integers

#sort using the number part of the string
mylist = ['XYZ-78.txt', 'XYZ-8.txt', 'XYZ-18.txt'] 
def func(elem):
    return int(elem.split('-')[1].split('.')[0])

sortlist = sorted(mylist,key=func)
for i in sortlist:
  print(i)

what you see is the ordering based on the ASCII's value's cipher

encapsulate the variable with int.

Ex:

mylist = ['XYZ-78.txt', 'XYZ-8.txt', 'XYZ-18.txt'] 
print(sorted(mylist, key=lambda x: int(x.split("-")[-1].split(".")[0])))

Output:

['XYZ-8.txt', 'XYZ-18.txt', 'XYZ-78.txt']

With str methods:

mylist = ['XYZ-78.txt', 'XYZ-8.txt', 'XYZ-18.txt']
result = sorted(mylist, key=lambda x: int(x[x.index('-')+1:].replace('.txt', '')))

print(result)

The output:

['XYZ-8.txt', 'XYZ-18.txt', 'XYZ-78.txt']

Use this code for sorting the list of strings numerically (which is needed) instead of sorting it in lexographically (which is taking place in the given code).

#sort using the number part of the string
mylist = ['XYZ-78.txt', 'XYZ-8.txt', 'XYZ-18.txt'] 
def func(elem):
    return elem[elem.index('-')+1:len(elem)-5]
sortlist = sorted(mylist,key=func)
for i in sortlist: 
    print(i) 

There is a generic approach to this problem called human readable sort or with the more popular name alphanum sort which basically sort things in a way humans expect it to appear.

import re
mylist = ['XYZ78.txt', 'XYZ8.txt', 'XYZ18.txt'] 

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
    """ Sort the given list in the way that humans expect.
    """

l.sort(key=alphanum_key)
['XYZ-8.txt', 'XYZ-18.txt', 'XYZ-78.txt']

That will work on any string, don't have to split and cut chars to extract a sort-able field.

Good read about alphanum: http://www.davekoelle.com/alphanum.html

Original Source code: https://nedbatchelder.com/blog/200712/human_sorting.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM