简体   繁体   English

Python 的 glob.glob 是如何排序的?

[英]How is Python's glob.glob ordered?

I have written the following Python code:我编写了以下 Python 代码:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os, glob

path = '/home/my/path'
for infile in glob.glob( os.path.join(path, '*.png') ):
    print infile

Now I get this:现在我明白了:

/home/my/path/output0352.png
/home/my/path/output0005.png
/home/my/path/output0137.png
/home/my/path/output0202.png
/home/my/path/output0023.png
/home/my/path/output0048.png
/home/my/path/output0069.png
/home/my/path/output0246.png
/home/my/path/output0071.png
/home/my/path/output0402.png
/home/my/path/output0230.png
/home/my/path/output0182.png
/home/my/path/output0121.png
/home/my/path/output0104.png
/home/my/path/output0219.png
/home/my/path/output0226.png
/home/my/path/output0215.png
/home/my/path/output0266.png
/home/my/path/output0347.png
/home/my/path/output0295.png
/home/my/path/output0131.png
/home/my/path/output0208.png
/home/my/path/output0194.png

In which way is it ordered?以何种方式订购?

To clarify: I am not interested in ordering - I know sorted .澄清一下:我对订购不感兴趣 - 我知道sorted I want to know in which order it comes by default.我想知道默认情况下它的顺序。

It might help you to get my ls -l output:它可能会帮助您获得我的 ls -l output:

-rw-r--r-- 1 moose moose 627669 2011-07-17 17:26 output0005.png
-rw-r--r-- 1 moose moose 596417 2011-07-17 17:26 output0023.png
-rw-r--r-- 1 moose moose 543639 2011-07-17 17:26 output0048.png
-rw-r--r-- 1 moose moose 535384 2011-07-17 17:27 output0069.png
-rw-r--r-- 1 moose moose 543216 2011-07-17 17:27 output0071.png
-rw-r--r-- 1 moose moose 561776 2011-07-17 17:27 output0104.png
-rw-r--r-- 1 moose moose 501865 2011-07-17 17:27 output0121.png
-rw-r--r-- 1 moose moose 547144 2011-07-17 17:27 output0131.png
-rw-r--r-- 1 moose moose 530596 2011-07-17 17:27 output0137.png
-rw-r--r-- 1 moose moose 532567 2011-07-17 17:27 output0182.png
-rw-r--r-- 1 moose moose 553562 2011-07-17 17:27 output0194.png
-rw-r--r-- 1 moose moose 574065 2011-07-17 17:27 output0202.png
-rw-r--r-- 1 moose moose 552197 2011-07-17 17:27 output0208.png
-rw-r--r-- 1 moose moose 559809 2011-07-17 17:27 output0215.png
-rw-r--r-- 1 moose moose 549046 2011-07-17 17:27 output0219.png
-rw-r--r-- 1 moose moose 566661 2011-07-17 17:27 output0226.png
-rw-r--r-- 1 moose moose 561678 2011-07-17 17:27 output0246.png
-rw-r--r-- 1 moose moose 525550 2011-07-17 17:27 output0266.png
-rw-r--r-- 1 moose moose 565715 2011-07-17 17:27 output0295.png
-rw-r--r-- 1 moose moose 568381 2011-07-17 17:28 output0347.png
-rw-r--r-- 1 moose moose 532768 2011-07-17 17:28 output0352.png
-rw-r--r-- 1 moose moose 535818 2011-07-17 17:28 output0402.png

It is not ordered by filename or size.它不是按文件名或大小排序的。

Other links: glob , ls其他链接: globls

Order is arbitrary, but you can sort them yourself顺序是任意的,但你可以自己排序

If you want sorted by name:如果要按名称排序:

sorted(glob.glob('*.png'))

sorted by modification time:按修改时间排序:

import os
sorted(glob.glob('*.png'), key=os.path.getmtime)

sorted by size:按大小排序:

import os
sorted(glob.glob('*.png'), key=os.path.getsize)

etc.等等

It is probably not sorted at all and uses the order at which entries appear in the filesystem, ie the one you get when using ls -U .它可能根本没有排序,并使用条目出现在文件系统中的顺序,即使用ls -U时得到的顺序。 (At least on my machine this produces the same order as listing glob matches). (至少在我的机器上,这会产生与列出glob匹配相同的顺序)。

By checking the source code of glob.glob you see that it internally calls os.listdir , described here:通过检查glob.glob的源代码,您会看到它在内部调用os.listdir ,如下所述:

http://docs.python.org/library/os.html?highlight=os.listdir#os.listdir http://docs.python.org/library/os.html?highlight=os.listdir#os.listdir

Key sentence: os.listdir(path) Return a list containing the names of the entries in the directory given by path.关键语句: os.listdir(path) 返回一个列表,其中包含路径给定的目录中条目的名称。 The list is in arbitrary order.该列表是任意顺序的。 It does not include the special entries '.'它不包括特殊条目“。” and '..' even if they are present in the directory.和 '..' 即使它们存在于目录中。

Arbitrary order .任意顺序 :) :)

Order is arbitrary, but there are several ways to sort them.顺序是任意的,但有几种方法可以对它们进行排序。 One of them is as following:其中之一如下:

#First, get the files:
import glob
import re
files =glob.glob1(img_folder,'*'+output_image_format)
# if you want sort files according to the digits included in the filename, you can do as following:
files = sorted(files, key=lambda x:float(re.findall("(\d+)",x)[0]))

glob.glob() is a wrapper around os.listdir() so the underlaying OS is in charge for delivering the data. glob.glob() 是 os.listdir() 的包装器,因此底层操作系统负责传递数据。 In general: you can not make an assumption on the ordering here.一般来说:您不能对此处的排序做出假设。 The basic assumption is: no ordering.基本假设是:没有排序。 If you need some sorting: sort on the application level.如果您需要一些排序:在应用程序级别排序。

I had a similar issue, glob was returning a list of file names in an arbitrary order but I wanted to step through them in numerical order as indicated by the file name.我有一个类似的问题, glob正在以任意顺序返回文件名列表,但我想按照文件名指示的数字顺序逐步浏览它们。 This is how I achieved it:这就是我实现它的方式:

My files were returned by glob something like:我的文件由glob返回,例如:

myList = ["c:\tmp\x\123.csv", "c:\tmp\x\44.csv", "c:\tmp\x\101.csv", "c:\tmp\x\102.csv", "c:\tmp\x\12.csv"]

I sorted the list in place, to do this I created a function:我对列表进行了排序,为此我创建了一个 function:

def sortKeyFunc(s):
    return int(os.path.basename(s)[:-4])

This function returns the numeric part of the file name and converts to an integer.I then called the sort method on the list as such:这个 function 返回文件名的数字部分并转换为 integer.I 然后调用列表中的排序方法,如下所示:

myList.sort(key=sortKeyFunc)

This returned a list as such:这返回了一个列表:

["c:\tmp\x\12.csv", "c:\tmp\x\44.csv", "c:\tmp\x\101.csv", "c:\tmp\x\102.csv", "c:\tmp\x\123.csv"]

From @Johan La Rooy's solution, sorting the images using sorted(glob.glob('*.png')) does not work for me, the output list is still not ordered by their names.从@Johan La Rooy 的解决方案中,使用sorted(glob.glob('*.png'))对图像进行排序对我不起作用,output 列表仍然没有按名称排序。

However, the sorted(glob.glob('*.png'), key=os.path.getmtime) works perfectly.但是, sorted(glob.glob('*.png'), key=os.path.getmtime)工作得很好。

I am a bit confused how can sorting by their names does not work here.我有点困惑如何按他们的名字排序在这里不起作用。

Thank @Martin Thoma for posting this great question and @Johan La Rooy for the helpful solutions.感谢@Martin Thoma 发布这个好问题,感谢@Johan La Rooy 提供有用的解决方案。

If you're wondering about what glob.glob has done on your system in the past and cannot add a sorted call, the ordering will be consistent on Mac HFS+ filesystems and will be traversal order on other Unix systems.如果您想知道 glob.glob 过去在您的系统上做了什么并且无法添加sorted调用,则顺序将在Mac HFS+ 文件系统上保持一致,并且在其他 Unix 系统上将是遍历顺序 So it will likely have been deterministic unless the underlying filesystem was reorganized which can happen if files were added, removed, renamed, deleted, moved, etc...因此,除非底层文件系统被重新组织,否则它可能是确定性的,如果文件被添加、删除、重命名、删除、移动等,可能会发生这种情况......

At least in Python3 you also can do this:至少在 Python3 中你也可以这样做:

import os, re, glob

path = '/home/my/path'
files = glob.glob(os.path.join(path, '*.png'))
files.sort(key=lambda x:[int(c) if c.isdigit() else c for c in re.split(r'(\d+)', x)])
for infile in files:
    print(infile)

This should lexicographically order your input array of strings (eg respect numbers in strings while ordering).这应该按字典顺序对您的输入字符串数组进行排序(例如,在排序时尊重字符串中的数字)。

I used the built in sorted so solve this problem:我使用了内置的 sorted 来解决这个问题:

from pathlib import Path

p = Path('/home/my/path')
sorted(list(p.glob('**/*.png')))

Please try this code:请尝试以下代码:

sorted(glob.glob( os.path.join(path, '*.png') ),key=lambda x:float(re.findall("([0-9]+?)\.png",x)[0]))
'''my file name is 
"0_male_0.wav", "0_male_2.wav"... "0_male_30.wav"... 
"1_male_0.wav", "1_male_2.wav"... "1_male_30.wav"... 
"8_male_0.wav", "8_male_2.wav"... "8_male_30.wav"

when I wav.read(files) I want to read them in a sorted torder, i.e., "0_male_0.wav"
"0_male_1.wav"
"0_male_2.wav" ...
"0_male_30.wav"
"1_male_0.wav"
"1_male_1.wav"
"1_male_2.wav" ...
"1_male_30.wav"
so this is how I did it.

Just take all files start with "0_*" as an example. Others you can just put it in a loop
'''

import scipy.io.wavfile as wav
import glob 
from os.path import isfile, join

#get all the file names in file_names. THe order is totally messed up
file_names = [f for f in listdir(audio_folder_dir) if isfile(join(audio_folder_dir, f)) and '.wav' in f] 
#find files that belongs to "0_*" group
filegroup0 = glob.glob(audio_folder_dir+'/0_*')
#now you get sorted files in group '0_*' by the last number in the filename
filegroup0 = sorted(filegroup0, key=getKey)

def getKey(filename):
    file_text_name = os.path.splitext(os.path.basename(filename))  #you get the file's text name without extension
    file_last_num = os.path.basename(file_text_name[0]).split('_')  #you get three elements, the last one is the number. You want to sort it by this number
    return int(file_last_num[2])

That's how I did my particular case.这就是我做我的特殊情况的方式。 Hope it's helpful.希望它有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM