简体   繁体   English

修改python脚本以运行多个输入文件

[英]Modify python script to run for multiple input files

I am very new to python, and I have a python script to run for a particular file (input1.txt) and generated a output (output1.fasta), but I would like to run this script for multiple files, for example: input2.txt, input3.txt...and generate the respective output: output2.fasta, output3.fasta 我是python的新手,我有一个针对特定文件(input1.txt)运行的python脚本,并生成了输出(output1.fasta),但我想针对多个文件运行此脚本,例如:input2 .txt,input3.txt ...并生成相应的输出:output2.fasta,output3.fasta

from Bio import SeqIO

fasta_file = "sequences.txt" 
wanted_file = "input1.txt" 
result_file = "output1.fasta" 

wanted = set()
with open(wanted_file) as f:
    for line in f:
        line = line.strip()
        if line != "":
            wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
with open(result_file, "w") as f:
    for seq in fasta_sequences:
        if seq.id in wanted:
            SeqIO.write([seq], f, "fasta")

I tried to add the glob function, but I do not know how to deal with the output file name. 我试图添加glob函数,但是我不知道如何处理输出文件名。

from Bio import SeqIO
import glob

fasta_file = "sequences.txt"

for filename in glob.glob('*.txt'):

    wanted = set()
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if line != "":
                wanted.add(line)

    fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
    with open(result_file, "w") as f:
        for seq in fasta_sequences:
            if seq.id in wanted:
                SeqIO.write([seq], f, "fasta")

The error message is: NameError: name 'result_file' is not defined 错误消息是:NameError:未定义名称'result_file'

Your glob is currently pulling your "sequences" file as well as the inputs because *.txt includes the sequences.txt file. 您的glob文件当前正在提取“序列”文件和输入,因为*.txt包含sequences.txt文件。 If the "fasta" file is always the same and you only want to iterate the input files, then you need 如果“ fasta”文件始终相同,而您只想迭代输入文件,则需要

for filename in glob.glob('input*.txt'):

Also, to iterate through your entire process, perhaps you want to put it within a method. 另外,要遍历整个过程,也许您想将其放入方法中。 And if the output filename is always created to correspond to the input, then you can create that dynamically. 而且,如果始终创建输出文件名以与输入相对应,则可以动态创建该文件名。

from Bio import SeqIO

def create_fasta_outputs(fasta_file, wanted_file):
    result_file = wanted_file.replace("input","output").replace(".txt",".fasta")

    wanted = set()
    with open(wanted_file) as f:
        for line in f:
            line = line.strip()
            if line != "":
                wanted.add(line)
    fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
    with open(result_file, "w") as f:
        for seq in fasta_sequences:
            if seq.id in wanted:
                SeqIO.write([seq], f, "fasta")

fasta_file = "sequences.txt"
for wanted_file in glob.glob('input*.txt'):
    create_fasta_outputs(fasta_file, wanted_file)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM