遍历子目录中的文件并加载它们

Question

I've got the following situation to solve: my parent directory has ~25 subfolders.我有以下情况需要解决：我的父目录有 ~25 个子文件夹。 Each of the subfolders contains a file that ends with 'RRS.csv'.每个子文件夹都包含一个以“RRS.csv”结尾的文件。 Additonally, some of the subfolders contain a file ending with 'ROH.csv'.此外，一些子文件夹包含一个以“ROH.csv”结尾的文件。 From each of the subfolders, i need to import the 'ROH.csv' file if it exists, and if not, the 'RRS.csv' file.从每个子文件夹中，我需要导入“ROH.csv”文件（如果存在），如果不存在，则导入“RRS.csv”文件。 I tried this through iterating through all subfolders and all files in the subfolders using the os.path.exists operator to check if the 'ROH.csv' file exists.我尝试通过使用 os.path.exists 运算符遍历所有子文件夹和子文件夹中的所有文件来检查“ROH.csv”文件是否存在。 Another idea was to first list all files in each subfolder, then identifying if one element contains the 'ROH.csv' ending and then loading it.另一个想法是首先列出每个子文件夹中的所有文件，然后确定一个元素是否包含“ROH.csv”结尾，然后加载它。

for filename in sorted(os.listdir(parent_dir)):
    for subdir, dirs, files in os.walk(parent_dir):
        if not os.path.exists(filename.endswith('ROH.csv')):
            data = np.genfromtxt(filename.endswith('RRS.csv'), delimiter=',', skip_header=1)
            # some calculations
        else:
            data = np.genfromtxt(filename.endswith('ROH.csv'), delimiter=',', skip_header=1)
            # more funny calculations

This code has multiple problems: (i) it has to check if one file in the subfolder ends with 'ROH.csv', and not if each file ends with it;这段代码有多个问题：(i) 它必须检查子文件夹中的一个文件是否以“ROH.csv”结尾，而不是每个文件是否都以它结尾； (ii) i havent figured out a way yet to specify which file to load; (ii) 我还没有想出一种方法来指定要加载哪个文件； endswith does not work (bool); Endswith 不起作用（布尔）； (iii) it contains double for-loops. (iii) 它包含双 for 循环。

Hope anyone has an idea to solve this.希望任何人都有解决这个问题的想法。

Answer 1

I'm not really sure I understand the problems you mention, but here is a snippet that gives you the list of file to process.我不确定我是否理解您提到的问题，但这里有一个片段，为您提供要处理的文件列表。 You can change the filtering in more clever or readable ways, but I have tested this with one level of nested folders and it does the job.您可以以更聪明或更易读的方式更改过滤，但我已经使用一层嵌套文件夹对此进行了测试，并且它可以完成这项工作。 You can write for sure something more clear or suitable to your needs starting from here你可以从这里开始写一些更清晰或更适合你的需求的东西

#parent_dir=os.getcwd()+"\\temp" 
files_to_read=[]
walk = [(subdir, dirs, files) for (subdir, dirs, files) in os.walk(parent_dir) if not (subdir==parent_dir)]  #Skip the root directory
for (subdir, dirs, files) in walk:
    file_to_read = list(filter(lambda x: "ROH.csv" in x or "RRS.csv" in x, files)) #Explicitly filter for one of the two strings
    if len(file_to_read)>1:
        file_to_read = list(filter(lambda x: "ROH.csv" in x, files))[0]      #explictly pick the ROH file if there are both files
    elif len(file_to_read)>0:
        file_to_read = file_to_read[0]                           #Otherwise pick the only file in the list i.e. RRS
    file_to_read=subdir+os.path.sep+file_to_read
    files_to_read.append(file_to_read)

Answer 2

This is assuming you need to do different things for ROH files and RRS files:这是假设您需要对 ROH 文件和 RRS 文件执行不同的操作：

def get_files_with_suffix(root, files, suffix):
    return [os.path.join(root, filename) for filename in files if filename.endswith(suffix)]

for root, dirs, files in os.walk(parent_dir):
    roh_files = get_files_with_suffix(root, files, 'ROH.csv')
    if roh_files:
        for roh_file in roh_files:
            data = np.genfromtxt(roh_file, delimiter=',', skip_header=1)
            # more funny calculations
    else:
        rrs_files = get_files_with_suffix(root, files, 'RRS.csv')
        for rrs_file in rrs_files:
            data = np.genfromtxt(rrs_file, delimiter=',', skip_header=1)
            # some calculations

If the calculations are the same, I would extract all the code duplication to different functions:如果计算相同，我会将所有代码重复提取到不同的函数：

def give_this_a_better_name_that_explains_the_specific_calculations(filename):
    data = np.genfromtxt(filename, delimiter=',', skip_header=1)
    # some calculations

def get_files_with_suffix(root, files, suffix):
    return [os.path.join(root, filename) for filename in files if filename.endswith(suffix)]

def process_files_with_suffix_but_give_this_a_better_name_too(root, files, suffix):
    files = get_files_with_suffix(root, files, suffix)
    for file in files:
        give_this_a_better_name_that_explains_the_specific_calculations(file)
    return files

for root, dirs, files in os.walk(parent_dir):
    if not process_files_with_suffix_but_give_this_a_better_name_too(root, filename, 'ROH.csv'):
        process_files_with_suffix_but_give_this_a_better_name_too(root, filename, 'RRS.csv')

遍历子目录中的文件并加载它们

问题描述

2 个解决方案

解决方案1
0 2020-10-15 11:16:20

解决方案2
0 已采纳 2020-10-15 11:16:28

遍历子目录中的文件并加载它们

问题描述

2 个解决方案

解决方案1 0 2020-10-15 11:16:20

解决方案2 0 已采纳 2020-10-15 11:16:28

解决方案1
0 2020-10-15 11:16:20

解决方案2
0 已采纳 2020-10-15 11:16:28