简体   繁体   English

Python遍历子目录查找文件对

[英]Python iterate through subdirectories finding file pairs

I have a deep subfolder structure like this: 我有一个像这样的深层子文件夹结构:

a/b/file1.txt
a/b/file1.doc
a/b/file2.txt
a/b/file2.doc
a/c/file3.txt
a/c/file3.doc
a/c/d/file4.txt
a/c/d/file4.doc

I want to extract all the .txt and .doc file pairs - eg into a list of tuples - file names are identical, just file types differ. 我想提取所有.txt和.doc文件对-例如放入元组列表中-文件名相同,只是文件类型不同。

Best I have come up with so far is the following which doesnt look very efficient: 到目前为止,我能想到的最好的是看起来不太有效的以下内容:

files = []
for root, dirs, files in os.walk(path):
    for filename in files:
        if os.path.isdir(os.path.join(os.path.abspath("."), filename)):
            file_list = os.listdir(filename)
            file_list_copy = file_list.copy()
            #for each in file_list of type .txt
            # find .doc of same name in file_list_copy
            #add the 2 to tuple nd append to list

May not be the most efficient but works: 可能不是最有效的,但是可以工作:

Using shell command to move the types to separate folders (ran for both txt and doc extensions to create 2 folders): 使用shell命令将类型移动到单独的文件夹(同时运行txt和doc扩展名以创建2个文件夹):

find /path-to-files-root/ -type f -name '*.txt' -exec mv -i {} /new-path-to-files/txt/ \;

Then I ran: 然后我跑了:

def get_all_files(path, pattern):
#see https://stackoverflow.com/questions/17282887/getting-files-with-same-name-irrespective-of-their-extension
    datafiles = []
    for root,dirs,files in os.walk(path):
        for file in fnmatch.filter(files, pattern):
            datafiles.append(file)
    return datafiles

txt_files = [f for f in os.listdir(txt_path) if isfile(join(txt_path, f))]
doc_files = [f for f in os.listdir(doc_path) if isfile(join(doc_path, f))]
for i, txt_file in enumerate(txt_files):
    filename = (os.path.splitext(txt_file)[0])
    doc_files = get_all_files(doc_path, '{0}.doc'.format(filename))
    if len(doc_files)== 1:
        doc_file = doc_files[0]
        #do something with txt_file and doc_file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM