简体   繁体   English

确定目录大小的最有效方法 Python

[英]Most efficient way to determine the size of a directory in Python

os.walk has a helpful example: os.walk 有一个有用的例子:

import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
    print(root, "consumes", end=" ")
    print(sum(getsize(join(root, name)) for name in files), end=" ")
    print("bytes in", len(files), "non-directory files")
    if 'CVS' in dirs:
        dirs.remove('CVS')  # don't visit CVS directories

Despite the note that os.walk got faster in Python 3.5 by switching to os.scandir , this doesn't mention that it's still a sub-optimal implementation on Windows.尽管注意到os.walk通过切换到os.scandir在 Python 3.5 中变得更快,但这并没有提到它仍然是 Windows 上的次优实现。

https://www.python.org/dev/peps/pep-0471/ does describe this & gets it almost right. https://www.python.org/dev/peps/pep-0471/确实描述了这一点并且几乎是正确的。 However, it recommends using recursion.但是,它建议使用递归。 When dealing with arbitrary folder structures, this doesn't work so well as you'll quickly hit Python recursion limits (you'll only be able to iterate a folder structure up to 1000 folders deep, which if you're starting at the root of the filesystem isn't necessarily unrealistic. The real limit isn't actually 1000. It's 1000 - your Python call depth when you go to run this function. If you're doing this in response to a web service request through Django with lots of business logic layers, it wouldn't be unrealistic to get close to this limit easily.在处理任意文件夹结构时,这不会很好地工作,因为您会很快达到 Python 递归限制(您将只能迭代文件夹结构最多 1000 个文件夹深度,如果您从根目录开始文件系统的数量不一定不现实。真正的限制实际上不是 1000。它是 1000 - 当您 go 运行此 function 时,您的 Python 调用深度。如果您这样做是为了通过 883238686555 响应 web 服务请求的业务逻辑层,很容易接近这个限制也不是不现实的。

The following snippet should be optimal on all operating systems & handle any folder structure you throw at it.以下代码片段在所有操作系统上都应该是最佳的,并且可以处理您扔给它的任何文件夹结构。 Memory usage will obviously grow the more folders you encounter but to my knowledge there's nothing you can really do about that as you somehow have to keep track of where you need to go. Memory 的使用显然会随着您遇到的文件夹的增加而增加,但据我所知,您对此无能为力,因为您必须以某种方式跟踪需要 go 的位置。

def get_tree_size(path):
    total_size = 0
    dirs = [path]
    while dirs:
        next_dir = dirs.pop()
        with os.scandir(next_dir) as it:
            for entry in it:
                if entry.is_dir(follow_symlinks=False):
                    dirs.append(entry.path)
                else:
                    total_size += entry.stat(follow_symlinks=False).st_size
    return total_size

It's possible using a collections.deque may speed up operations vs the naiive usage of a list here but I suspect it would be hard to write a benchmark to show this with disk speeds what they are today.使用collections.deque可能会加快操作速度,而不是在这里简单地使用列表,但我怀疑很难编写一个基准来用磁盘速度来显示它们今天的速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM