[英]Simplify/Enhance Python Filtering Algorithm
我正在寻找一种方法来识别最顶层目录中的 dist.xml 文件。
例如,我有这个目录列表,
/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/server/dist.xml
/opt/documents/dist.xml
/opt/documents/web/dist.xml
/opt/documents/class/dist.xml
/opt/documents/lessons/1/dist.xml
/opt/documents/lessons/2/dist.xml
/opt/documents/lessons/3/dist.xml
/opt/documents/lessons/4/dist.xml
/opt/documents/lessons/5/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
/opt/music/service/month/1/dist.xml
/opt/music/service/month/2/dist.xml
我正在寻找这个 output 代替,
/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
我有以下代码似乎可以完成这项工作,想知道是否有任何方法可以加快或清除代码,
from pathlib import Path
paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', '/opt/public/dist.xml', '/opt/documents/server/dist.xml', '/opt/documents/dist.xml', '/opt/documents/web/dist.xml', '/opt/documents/class/dist.xml', '/opt/documents/lessons/1/dist.xml', '/opt/documents/lessons/2/dist.xml', '/opt/documents/lessons/3/dist.xml', '/opt/documents/lessons/4/dist.xml', '/opt/documents/lessons/5/dist.xml', '/opt/music/service/day/dist.xml', '/opt/music/service/week/dist.xml', '/opt/music/service/month/dist.xml', '/opt/music/service/month/1/dist.xml', '/opt/music/service/month/2/dist.xml']
paths = list(set(paths))
paths_folder = [str(Path(p).parent) for p in paths]
to_remove = []
for idx, val in enumerate(paths_folder):
for b in Path(val).parents:
if str(b) in paths_folder:
to_remove.append(idx)
paths_folder = [i for j, i in enumerate(paths_folder) if j not in to_remove]
paths_folder = [p + '/dist.xml' for p in paths_folder]
print(paths_folder)
这是一种可能更清洁的方法,因为它避免了索引跟踪等:
首先对所有path_folders
进行排序,使其首先是最顶层的文件夹。 然后检查每个父文件夹在“顶级文件夹”列表中的存在,就像你所做的那样,但使用all()
内置,只有在所有项目的条件都为真后才为真。 然后立即将其添加到最终文件夹列表中,因为该项目之后的任何内容要么是不同的文件夹,要么是当前文件夹的子文件夹; 因为之前做过的那种。
from pathlib import Path
paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', ...] # as above
path_folders = sorted(set(Path(p).parent for p in paths),
key=lambda x: (len(x.parts), x)) # is now sorted tops-first
top_folders = []
for folder in path_folders:
if all(parent not in top_folders for parent in folder.parents):
top_folders.append(folder)
top_dists = [f / 'dist.xml' for f in top_folders] # can use '/' with Path objs!
print(top_dists)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.