简体   繁体   English

在 python 中将文件夹中的 fasta 文件连接成单个文件

[英]Concatenating fasta files in folder into single file in python

I have multiple fasta sequence files stored in a folder within my current working directory (called "Sequences") and am trying to combine all the sequences into a single file to run a MUSLCE multiple sequence alignment on.我有多个 fasta 序列文件存储在我当前工作目录(称为“序列”)内的一个文件夹中,我试图将所有序列组合到一个文件中以运行 MUSLCE 多序列 alignment。

This is what I have so far and it is functional up until the output_fas.close(), where i get the error message FileNotFoundError: [Errno 2] No such file or directory: './Sequences'这是我到目前为止所拥有的,它在 output_fas.close() 之前一直有效,在那里我收到错误消息FileNotFoundError: [Errno 2] No such file or directory: './Sequences'

Here is the code:这是代码:

 import os
os.getcwd() #current directory
DIR = input("\nInput folder path containing FASTA files to combine into one FASTA file: ")
os.chdir(DIR)
FILE_NAME = input("\nWhat would you like to name your output file (e.g. combo.fas)? Note: "
                  "Please add the .fas extension: ")
output_fas = open(FILE_NAME, 'w')
file_count = 0

for f in os.listdir(DIR):
    if f.endswith(( ".fasta")):
        file_count += 1
        fh = open(os.path.join(DIR, f))
        for line in fh:
            output_fas.write(line)
        fh.close()

output_fas.close()
print(str(file_count) + " FASTA files were merged into one file, which can be found here: " + DIR)

When i input the directory i input it as './Sequences' which successfully changes the directory.当我输入目录时,我将其输入为“./Sequences”,这成功地更改了目录。

Not quite sure what to do.不太确定该怎么做。 I adjusted the code before and it successfully created the new files with all the sequences concatenated together, however it ran continuously and would not end and had multiple repeats of each sequence.我之前调整了代码,它成功地创建了所有序列连接在一起的新文件,但是它连续运行并且不会结束并且每个序列都有多次重复。

Appreciate the help!感谢帮助!

The error should occur before the output_fas.close() , and should be seen at the os.listdir(DIR) call.该错误应该发生在output_fas.close()之前,并且应该在os.listdir(DIR)调用中看到。 The problem is that DIR becomes meaningless as soon as you execute the os.chdir(DIR) command.问题是一旦执行os.chdir(DIR)命令, DIR就变得毫无意义。 DIR was provided as a relative path, and os.chdir(DIR) changes to the new directory, making the old relative path no longer correct relative to the new directory. DIR作为相对路径提供,并且os.chdir(DIR)更改为新目录,使得旧相对路径相对于新目录不再正确。

If you're going to use os.chdir(DIR) , then never use DIR again, and just change your loop to:如果您打算使用os.chdir(DIR) ,那么永远不要再使用DIR ,只需将循环更改为:

# Use with statement for guaranteed deterministic close at end of block & to avoid need
# for explicit close
with open(FILE_NAME, 'w') as output_fas:
    file_count = 0
    for f in os.listdir():  # Remove DIR to list current directory
        if f.endswith(".fasta"):
            file_count += 1
            # Use a with for same reason as above
            with open(f) as fh: # Don't join to DIR because f is already correctly in current directory
                output_fas.writelines(fh)  # writelines does the loop of write calls for you

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM