简体   繁体   English

os.listdir()是确定性的吗?

[英]Is os.listdir() deterministic?

From Python's doc, os.listdir() returns 从Python的文档中, os.listdir()返回

a list containing the names of the entries in the directory given by path. 一个列表,其中包含由path给出的目录中条目的名称。 The list is in arbitrary order. 该列表是任意顺序的。

What I'm wondering is, is this arbitrary order always the same/deterministic? 我想知道的是,这个任意顺序是否总是相同/确定性的? (from one machine to another, or through time, provided the content of the folder is the same) (如果文件夹的内容相同,则从一台计算机到另一台计算机,或者随着时间的流逝)

Edit: I am not trying to make it deterministic, nor do I want to use this. 编辑:我不是要使其具有确定性,也不是我想使用它。 I was just wondering (for example, what does the order depend on?) 我只是想知道(例如,订单取决于什么?)

In order to understand what is going on we can inspect the underlying implementation for python 3.2 that can be found here . 为了了解发生了什么,我们可以检查python 3.2的基础实现,可以在这里找到。

We will focus on the POSIX part that starts at line 2574 . 我们将重点介绍从2574行开始的POSIX部分。 In the code are defined: 在代码中定义:

DIR *dirp;              // will store the pointer to the directory
struct dirent *ep;      // will store the pointer to the entry

There are two important POSIX calls: opendir at line 2596 and readdir at line 2611 . 有两个重要的POSIX调用:第2596行的opendir和第2611行的readdir

As you can read from the readdir man page: 正如您可以从readdir手册页中读取的那样:

The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by dirp. readdir()函数返回一个指向dirent结构的指针,该结构表示dirp指向的目录流中的下一个目录条目。 It returns NULL on reaching the end of the directory stream or if an error occurred. 在到达目录流的末尾或发生错误时,它返回NULL。

So, readdir reads the next entry in the directory, but it is up to the file system implementation to define what is the next . 因此, readdir读取目录中的下一个条目,但是要由文件系统实现来定义next You can read more about this topic here : 您可以在此处阅读有关此主题的更多信息:

[...] Because this is a per-filesystem thing, it follows that the traversal order can be different for different directories on the same system even if they have the same entries created in the same order, either because the directories are using different filesystem types or just because some parameters were set differently on the different filesystems. [...]因为这是每个文件系统的事,所以即使同一目录中创建的条目相同,遍历顺序对于同一系统上的不同目录也可能不同,或者是因为目录使用的目录不同文件系统类型,或者仅仅是因为在不同的文件系统上设置了一些参数而已。

You can look at the link posted in the comments by @Hamish, which digs a little into Python's hooks into UNIX' opendir and readdir implementations, from where you would need to dig deeper into file systems and how directory data structures are stored... 您可以查看@Hamish在评论中发布的链接该链接对Python与UNIX的opendirreaddir实现的钩子进行了一些深入研究,您需要从那里深入研究文件系统以及如何存储目录数据结构...

The short version of it however is trivially simple: the underlying file system does not store directory entries ordered by file names. 但是,它的简短版本非常简单:基础文件系统不存储按文件名排序的目录条目。 It is concerned with keeping the directory entries sane and consistent, file names are just an arbitrary label associated with each entry and irrelevant to the core function of the file system. 它与保持目录条目合理和一致有关, 文件名只是与每个条目相关联的任意标签,与文件系统的核心功能无关。 Worrying about the human readable labels associated with each directory entry is done at a higher level, for example in your Python code. 担心与每个目录条目关联的人类可读标签是在更高层次上进行的,例如在您的Python代码中。

Yes, it is deterministic, it's certainly not purposefully randomised . 是的,它确定性的,当然不是故意将其随机化的 However, the determinism is somewhere deep in the details of the file system implementation, and the lexical order of the file names plays no role in it. 但是,确定性在文件系统实现的细节中深处,文件名的词法顺序在其中不起作用。

It will probably depend on file system internals. 它可能取决于文件系统内部。 On a typical unix machine, I would expect the order of items in the return value from os.listdir to be in the order of the details in the directory's "dirent" data structure (which, again, depends on the specifics of the file system). 在典型的unix机器上,我希望os.listdir返回值中的项顺序与目录“ dirent”数据结构中详细信息的顺序相同(这又取决于文件系统的详细信息) )。

I would not expect a directory to have the same ordering over time, if files are added and deleted. 如果添加和删除文件,我不希望目录随时间推移具有相同的顺序。

I would not expect two "directories with the same contents" on two different machines to have a consistent ordering, unless specific care was taken when copying from one to the other. 我不希望两台不同机器上的两个“具有相同内容的目录”具有一致的顺序,除非在从一台计算机复制到另一台计算机时特别注意。

Depending on a variety of specifics, the ordering may change on a single machine, over time, without any explicit changes to the directory, as various file system compacting operations take place (although I don't think I've seen a file system that would actually do this, but it's definitely something that could be done). 视各种具体情况而定,随着时间的推移,顺序可能会在一台计算机上更改,而不会对目录进行任何显式更改,因为会进行各种文件系统压缩操作(尽管我认为我没有看到过这样的文件系统:实际上可以做到这一点,但这绝对是可以做到的)。

In short, if you want any sort of ordering you can reason about, sort the results, somehow. 简而言之,如果您想进行任何排序,就可以对结果进行排序。 Then you have the guarantee that the ordering will be whatever your sorting imposes. 然后,您可以保证排序将是您的排序所强加的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM