简体   繁体   中英

Sort a list that contains path in python

How can I sort a path that it contains integer as well as strings? My file names are :

tmp_1483228800-1485907200_0, 
tmp_1483228800-1485907200_1,
tmp_1483228800-1485907200_2,
.... 

I need to sort them according to the integers after the last underline. That's how my code looks like:

act = "." + "/*/raw_results.csv"
files = glob.glob(act)
sorted_list = sorted(files, key = lambda x:int(os.path.splitext(os.path.dirname(x))[0]))

I know the problem is there are lot of integers and some strings in between so it can not convert everything to integer,but I do not know how to solve it. Thanks in advance.

You could simply use str.rsplit() for the key:

>>> lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2','tmp_1483228800-1485907200_0']
>>> sorted(lst, key=lambda x: int(x.rsplit('_', 1)[-1]))
['tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2']

code:

import re, os
PATH = "C:\Temp"
lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_2']

def stringSplitByNumbers(x):
    l = re.findall('\d$', x)[0]
    return [int(y) if y.isdigit() else y for y in l]

print [ os.path.join(PATH, _) for _ in sorted(lst, key=stringSplitByNumbers)]

output:

['C:\\Temp\\tmp_1483228800-1485907200_0', 'C:\\Temp\\tmp_1483228800-1485907200_1', 'C:\\Temp\\tmp_1483228800-1485907200_2']

According to your comments, your files will be in this format:

>>> files = [".../data/tmp_1483228801-1485907200_10/raw_results.csv",
             ".../data/tmp_1483228800-1485907200_1/raw_results.csv",
             ".../data/tmp_1483228801-1485907201_30/raw_results.csv",
             ".../data/tmp_1483228801-1485907200_2/raw_results.csv",
             ".../data/tmp_1483228801-1485907201_9/raw_results.csv"]

You can then just extract all the numbers in those full, raw file paths, and convert those to int . No need to split the path up into directory path segments.

>>> [[int(n) for n in re.findall(r"\d+", f)] for f in files]
[[1483228801, 1485907200, 10],
 [1483228800, 1485907200, 1],
 [1483228801, 1485907201, 30],
 [1483228801, 1485907200, 2],
 [1483228801, 1485907201, 9]]

This will extract all the numbers in the path and sort by them, giving the highest priority to the first number it finds. If those other numbers are all the same, that's not a problem, and if those are different, it will sort by those, first.

>>> sorted(files, key=lambda f: [int(n) for n in re.findall(r"\d+", f)])
['.../data/tmp_1483228800-1485907200_1/raw_results.csv',
 '.../data/tmp_1483228801-1485907200_2/raw_results.csv',
 '.../data/tmp_1483228801-1485907200_10/raw_results.csv',
 '.../data/tmp_1483228801-1485907201_9/raw_results.csv',
 '.../data/tmp_1483228801-1485907201_30/raw_results.csv']

If that's not what you want, you can use the (slightly wasteful) key=lambda f: [int(n) for n in re.findall(r"\\d+", f)][-1] to only sort by the last number.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM