简体   繁体   中英

glob.glob sorting - not as expected

Im reading in some files from a directory using glob.glob, these files are named as such: 1.bmp

The files/names continue in this naming pattern: 1.bmp, 2.bmp, 3.bmp ... and so on

This is the code that i currently have, however whilst technically this does sort, it isnt as expected. files= sorted(glob.glob('../../Documents/ImageAnalysis.nosync/sliceImage/*.bmp'))

This method sorts as such:

../../Documents/ImageAnalysis.nosync/sliceImage/84.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/85.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/86.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/87.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/88.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/89.bmp

../../Documents/ImageAnalysis.nosync/sliceImage/9.bmp

../../Documents/ImageAnalysis.nosync/sliceImage/90.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/91.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/92.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/93.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/94.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/95.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/96.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/97.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/98.bmp
../../Documents/ImageAnalysis.nosync/sliceImage/99.bmp

In the above code i have highlighted the problem really, it is able to sort the file names well for eg 90-99.bmp is completely fine however between 89.bmp and 90.bmp there is the file 9.bmp this obviously shouldnt be there and should be near the start

The sort of output that im expecting is like this:

1.bmp
2.bmp
3.bmp
4.bmp
5.bmp
6.bmp
...
10.bmp
11.bmp
12.bmp
13.bmp
...

and so on until the end of the files

Is this possible to do with glob?

That is because files as sorted based on their names (which are strings), and they are sorted in lexicographic order. Check [Python.Docs]: Sorting HOW TO for more sorting related details.
For things to work as you'd expect, the "faulty" file 9.bmp should be named 09.bmp (this applies to all such files). If you'd have more than 100 files, things would be even clearer (and desired file names would be 009.bmp , 035.bmp ).

Anyway, there is an alternative (provided that all of the files follow the naming pattern), by converting the file's base name (without extension - check [Python.Docs]: os.path - Common pathname manipulations ) to an int , and sort based on that (by providing key to [Python.Docs]: sorted ( iterable, *, key=None, reverse=False ) )

files = sorted(glob.glob("../../Documents/ImageAnalysis.nosync/sliceImage/*.bmp"), key=lambda x: int(os.path.splitext(os.path.basename(x))[0]))

Not with glob.glob . It returns a list unsorted or sorted according to the rules of the underlying system.

What you need to do is provide a suitable key function to sorted , to define the ordering you want, rather than as plain text strings. Something like (untested code):

def mysorter( x):
   path, fn = os.path.split( x)
   fn,ext = os.path.splitext( fn)
   if fn.isdigit():
       fnn = int(fn)
       fn = f'{fnn:08}'  # left pad with zeros
   return f'{path}/{fn}.{ext}'

Then

   results=sorted( glob.glob(...), key=mysorter )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM