简体   繁体   English

Python os.walk()函数与find命令

[英]Python os.walk() function vs. find command

I am writing a program to walk the filesystem to collect file information to put into a database. 我正在编写一个程序来遍历文件系统以收集文件信息以放入数据库中。 I am trying to learn python after a lifetime of shell scripting, and am seeing an issue between what find returns and what os.walk returns 我尝试过一辈子的shell脚本学习python, os.walk现在find返回和os.walk返回之间存在问题

find THIS_PATH -print

for dirpath, dirs, files in os.walk( THIS_PATH ):
    print ( root )
    for fname in files:
        print ( os.path.join( root, fname ) )

The issue I have is that the "OS" find returns symlinks to directories, but the python find does not, and I have no idea how to make it do that. 我遇到的问题是“ OS” find将符号链接返回目录,但是python查找没有,并且我也不知道如何使它这样做。 Now I don't want it to follow them (ie followlinks=True ) and that would create a different result from find as well. 现在,我不希望它跟随它们(即followlinks=True ),这也会产生与find不同的结果。 But I want to be able to print the entries that are symlinks to directories. 但是我希望能够打印作为目录符号链接的条目。

thanks c 谢谢c

If you want to get same output (sorting may vary), you need to print both directories and files for given path. 如果要获得相同的输出(排序可能会有所不同),则需要打印给定路径的目录和文件。 find returns directories as well as links (to anything). find返回目录以及链接(指向任何内容)。 Minimal change to you code would be: 对您的代码的最小更改是:

print(THIS_PATH)
for dirpath, dirs, files in os.walk(THIS_PATH):
    for fname in dirs + files:  # iterate over items form both lists
        print (os.path.join(dirpath, fname))

This may be a bit easier to do with pathlib : 使用pathlib可能会更容易pathlib

from pathlib import Path
mypath = Path(THIS_PATH)
for found_item in mypath.rglob('*'):
    print(mypath.joinpath(found_item))

For instance I've created the following tree: 例如,我创建了以下树:

.
├── d1
│   ├── d2
│   │   └── f2
│   └── f1
├── f2 -> d1/d2/f2
└── l1 -> d1

Running find will yield (note directories and links to directories appear the same way): 运行find将产生(注意目录和目录链接以相同的方式出现):

$ find .
.
./f2
./l1
./d1
./d1/.h
./d1/d2
./d1/d2/f2
./d1/f1

And running the first snippet with THIS_PATH='.' 然后使用THIS_PATH='.'运行第一个代码段THIS_PATH='.' yields the same items (in slightly different order, find would default to depth first, os.walk does breadth first). 产生相同的项目(顺序略有不同,“ find默认默认为“深度”,“ os.walk ”首先进行“广度”)。 For that pathlib example just be ware if THIS_PATH is '.' 对于pathlib例如只是洁具如果THIS_PATH'.' , as is it would chomp the leading ./ off. ,因为这样会使开头的./断掉。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM