简体   繁体   English

Python os.walk和符号链接

[英]Python os.walk and symlinks

While fixing one user's answer on AskUbuntu , I've discovered a small issue. 在AskUbuntu上修复一个用户的答案时 ,我发现了一个小问题。 The code itself is straightforward : os.walk , recursively get sum of all files in the directory. 代码本身很简单:os.walk,递归地获取目录中所有文件的总和。

But it breaks on symlinks : 但它打破了符号链接:

$ python test_code2.py $HOME                                                                                          
Traceback (most recent call last):
  File "test_code2.py", line 8, in <module>
    space += os.stat(os.path.join(subdir, f)).st_size
OSError: [Errno 2] No such file or directory: '/home/xieerqi/.kde/socket-eagle'

Question then is, how do I tell python to ignore those files and avoid summing them ? 那么问题是,如何告诉python忽略这些文件并避免对它们求和?

Solution : 方案

As suggested in the comments , I've added os.path.isfile() check and now it works perfectly and gives correct size for my home directory 正如评论中所建议的那样,我已经添加了os.path.isfile()检查,现在它完美运行并为我的主目录提供了正确的大小

$> cat test_code2.py                                                          
#! /usr/bin/python
import os
import sys

space = 0L  # L means "long" - not necessary in Python 3
for subdir, dirs, files in os.walk(sys.argv[1]):
    for f in files:
        file_path = os.path.join(subdir, f)
        if os.path.isfile(file_path):
           space += os.stat(file_path).st_size

sys.stdout.write("Total: {:d}\n".format(space))
$> python test_code2.py  $HOME                                                
Total: 76763501905

As already mentioned by Antti Haapala in a comment, The script does not break on symlinks, but on broken symlinks . 正如Antti Haapala在评论中已经提到的那样,脚本不会在符号链接上中断 ,而是在破坏的符号链接上 One way to avoid that, taking the existing script as a starting point, is using try/except : 避免这种情况的一种方法是将现有脚本作为起点,使用try/except

#! /usr/bin/python2
import os
import sys

space = 0L  # L means "long" - not necessary in Python 3
for root, dirs, files in os.walk(sys.argv[1]):
    for f in files:
        fpath = os.path.join(root, f)
        try:
            space += os.stat(fpath).st_size
        except OSError:
            print("could not read "+fpath)

sys.stdout.write("Total: {:d}\n".format(space))

As a side effect, it gives you information on possible broken links. 作为副作用,它为您提供可能的断开链接的信息。

Yes, os.path.isfile is the way to go. 是的, os.path.isfile是要走的路。 However the following version may be more memory efficient. 但是,以下版本可能更有效。

for subdir, dirs, files in os.walk(sys.argv[1]):
    paths = (os.path.join(subdir, f) for f in files)
    space = sum(os.stat(path).st_size for path in paths if os.path.isfile(path))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM