简体   繁体   English

通过多个文件夹串联多个文件

[英]Concatenating multiple files through multiple folders

Im trying to create a single file out of multiple text files I have across multiple folders. 我试图从多个文件夹中的多个文本文件中创建一个文件。 This is my code for concatenating. 这是我的串联代码。 It works only if the program file is placed in each folder: 仅当程序文件放在每个文件夹中时,它才有效:

        import os

        file_list = [each for each in cur_folder if each.endswith(".txt")]
        print file_list

        align_file = open("all_the_files.txt","w")

        seq_list = []

        for each_file in file_list:
                f_o = open(file_path,"r")
                seq = (f_o.read().replace("\n",""))
                lnth = len(seq)
                wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
                align_file.write(wholeseq)
                print "done" 

Now I tried to edit to make sure that it automatically runs through the entire Data folder and then enters the subdirectories and concatenates all the files without me having to paste the program file in each folder. 现在,我尝试进行编辑以确保它可以自动在整个Data文件夹中运行,然后进入子目录并连接所有文件,而无需将程序文件粘贴到每个文件夹中。 This is the edit. 这是编辑。

    import os

    dir_folder = os.listdir("C:\Users\GAMER\Desktop\Data")

    for each in dir_folder:
            cur_folder = os.listdir("C:\\Users\\GAMER\\Desktop\\Data\\"+each)
            file_list = []

            file_list = [each for each in cur_folder if each.endswith(".txt")]
            print file_list

            align_file = open("all_the_files.txt","w")

            seq_list = []

            for each_file in file_list:

                f_o = open(file_path,"r")
                seq = (f_o.read().replace("\n",""))
                lnth = len(seq)
                wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
                align_file.write(wholeseq)
                print "done" , cur_folder

However when I run this , I get an error on the first file of the folder saying no such file exists. 但是,当我运行this时,该文件夹的第一个文件出现错误,提示不存在该文件。 I can seem to understand why, specifically since it names the file which is not "hardcoded". 我似乎可以理解为什么,特别是因为它命名的文件不是“硬编码”的。 Any help will be appreciated. 任何帮助将不胜感激。

If the code looks ugly to you feel free to suggested better ways to do it. 如果代码看起来很丑陋,您可以随意提出更好的建议。

Jamie is correct - os.walk is most likely the function you need. Jamie是正确的-os.walk最有可能是您需要的功能。

An example based on your use case: 根据您的用例的示例:

for root, dirs, files in os.walk(r"C:\Users\GAMER\Desktop\Data"):
    for f in files:
        if f.endswith('.txt'):
            print(f)

This will print the name of every single file within every folder within the root directory passed in os.walk, as long as the filename ends in .txt. 只要文件名以.txt结尾,这将打印os.walk传递的根目录内每个文件夹内每个文件的名称。

Python's documentation is here: https://docs.python.org/2/library/os.html#os.walk Python的文档在这里: https : //docs.python.org/2/library/os.html#os.walk

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM