繁体   English   中英

简化/增强 Python 滤波算法

[英]Simplify/Enhance Python Filtering Algorithm

我正在寻找一种方法来识别最顶层目录中的 dist.xml 文件。

例如,我有这个目录列表,

/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/server/dist.xml
/opt/documents/dist.xml
/opt/documents/web/dist.xml
/opt/documents/class/dist.xml
/opt/documents/lessons/1/dist.xml
/opt/documents/lessons/2/dist.xml
/opt/documents/lessons/3/dist.xml
/opt/documents/lessons/4/dist.xml
/opt/documents/lessons/5/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml
/opt/music/service/month/1/dist.xml
/opt/music/service/month/2/dist.xml

我正在寻找这个 output 代替,

/opt/pictures/dist.xml
/opt/docs_old/dist.xml
/opt/public/dist.xml
/opt/documents/dist.xml
/opt/music/service/day/dist.xml
/opt/music/service/week/dist.xml
/opt/music/service/month/dist.xml

我有以下代码似乎可以完成这项工作,想知道是否有任何方法可以加快或清除代码,

from pathlib import Path

paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', '/opt/public/dist.xml', '/opt/documents/server/dist.xml', '/opt/documents/dist.xml', '/opt/documents/web/dist.xml', '/opt/documents/class/dist.xml', '/opt/documents/lessons/1/dist.xml', '/opt/documents/lessons/2/dist.xml', '/opt/documents/lessons/3/dist.xml', '/opt/documents/lessons/4/dist.xml', '/opt/documents/lessons/5/dist.xml', '/opt/music/service/day/dist.xml', '/opt/music/service/week/dist.xml', '/opt/music/service/month/dist.xml', '/opt/music/service/month/1/dist.xml', '/opt/music/service/month/2/dist.xml']

paths = list(set(paths))
paths_folder = [str(Path(p).parent) for p in paths]

to_remove = []
for idx, val in enumerate(paths_folder):
  for b in Path(val).parents:
    if str(b) in paths_folder:
      to_remove.append(idx)

paths_folder = [i for j, i in enumerate(paths_folder) if j not in to_remove]

paths_folder = [p + '/dist.xml' for p in paths_folder]

print(paths_folder)

这是一种可能更清洁的方法,因为它避免了索引跟踪等:

首先对所有path_folders进行排序,使其首先是最顶层的文件夹。 然后检查每个父文件夹在“顶级文件夹”列表中的存在,就像你所做的那样,但使用all()内置,只有在所有项目的条件都为真后才为真。 然后立即将其添加到最终文件夹列表中,因为该项目之后的任何内容要么是不同的文件夹,要么是当前文件夹的子文件夹; 因为之前做过的那种。

from pathlib import Path

paths = ['/opt/pictures/dist.xml', '/opt/docs_old/dist.xml', ...]  # as above
path_folders = sorted(set(Path(p).parent for p in paths),
                      key=lambda x: (len(x.parts), x))    # is now sorted tops-first

top_folders = []
for folder in path_folders:
    if all(parent not in top_folders for parent in folder.parents):
        top_folders.append(folder)

top_dists = [f / 'dist.xml' for f in top_folders]  # can use '/' with Path objs!
print(top_dists)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM