简体   繁体   English

使用 pathlib 仅获取具有 glob 模式的目录

[英]Get directories only with glob pattern using pathlib

I want to use pathlib.glob() to find directories with a specific name pattern ( *data ) in the current working dir.我想使用pathlib.glob()在当前工作目录中查找具有特定名称模式( *data )的目录。 I don't want to explicitly check via .isdir() or something else.我不想通过.isdir()或其他方式明确检查。

Input data输入数据

This is the relevant listing with three folders as the expected result and one file with the same pattern but that should be part of the result.这是相关列表,其中三个文件夹作为预期结果,一个文件具有相同的模式,但应该是结果的一部分。

ls -ld *data
drwxr-xr-x 2 user user 4,0K  9. Sep 10:22 2021-02-11_68923_data/
drwxr-xr-x 2 user user 4,0K  9. Sep 10:22 2021-04-03_38923_data/
drwxr-xr-x 2 user user 4,0K  9. Sep 10:22 2022-01-03_38923_data/
-rw-r--r-- 1 user user    0  9. Sep 10:24 2011-12-43_3423_data

Expected result预期结果

[
    '2021-02-11_68923_data/', 
    '2021-04-03_38923_data/',
    '2022-01-03_38923_data/'
]

Minimal working example最小的工作示例

from pathlib import Path
cwd = Path.cwd()

result = cwd.glob('*_data/')
result = list(result)

That gives me the 3 folders but also the file.这给了我 3 个文件夹还有文件。

Also tried the variant cwd.glob('**/*_data/') .还尝试了变体cwd.glob('**/*_data/')

glob is insufficient here. glob在这里是不够的。 From the filesystem's perspective, the directory's name really is "2021-02-11_68923_data", not "2021-02-11_68923_data/".从文件系统的角度来看,目录的名称实际上是“2021-02-11_68923_data”,而不是“2021-02-11_68923_data/”。 Since glob only looks at names, it cannot differentiate between "regular" files and directories, and you'd have to add some additional check, such as isdir that you mentioned.由于 glob 只查看名称,因此它无法区分“常规”文件和目录,您必须添加一些额外的检查,例如您提到的isdir

The trailing path separator certainly should be respected in pathlib glob patterns.尾随路径分隔符当然应该在 pathlib glob 模式中得到尊重。 This is the expected behaviour in shells on all platforms, and is also how the glob module works :这是所有平台上 shell 的预期行为,也是glob 模块的工作方式

If the pattern is followed by an os.sep or os.altsep then files will not match .如果模式后跟 os.sep 或 os.altsep ,则文件将不匹配

So, as a work-around, you can use the glob module to get the behaviour you want:因此,作为一种解决方法,您可以使用 glob 模块来获得您想要的行为:

>>> import glob
>>> glob.glob('*')
['html', 'images', 'test.py']
>>> glob.glob('*/')
['html/', 'images/']

The issue with pathlib was fixed in bpo-22276 , and merged in Python-3.11.0rc1 (see what's new: pathlib ). pathlib 的问题已在bpo-22276中修复,并在Python-3.11.0rc1中合并(请参阅新增功能: pathlib )。 So if you want to stick with pathlib, please test it out and report any issues.因此,如果您想坚持使用 pathlib,请对其进行测试并报告任何问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM