简体   繁体   English

使用 for 循环和打印/提取序列(python)从文件夹中打开和解析多个 .fasta 文件

[英]Open and Parse multiple .fasta files from a folder with a for loop and print/extract sequence (python)

I want to print the id and sequences of multiple .fasta files and additionally put them in an array but I got a problem with gaining access to the sequence itself.我想打印多个.fasta文件的 id 和序列,并将它们另外放在一个数组中,但是我在访问序列本身时遇到了问题。 I played around with SeqIO from Biopython to parse the .fasta files and tried through os and glob to gain access to the files in the folder.我使用 Biopython 的 SeqIO 来解析.fasta文件,并尝试通过 os 和 glob 来访问文件夹中的文件。 What am I doing wrong here, I'm really struggling with the code since I don't really have a lot of programming experience.我在这里做错了什么,我真的很纠结代码,因为我真的没有很多编程经验。 I don't get an error code here but there is also nothing printed.我在这里没有收到错误代码,但也没有打印任何内容。 Any advice?有什么建议吗?

from Bio import SeqIO
import os,glob
folder_path = ('genome_nucseq_unique/data/')
for seq_record in SeqIO.parse(glob.glob(os.path.join(folder_path, '*.fasta')), "fasta"):
    print(seq_record.id)
    print(seq_record.id)

SeqIO.parse expects a str , bytes or os.PathLike object, not a list like glob.glob() returns. SeqIO.parse需要strbytesos.PathLike对象,而不是像glob.glob()返回的list Modify your function like this:像这样修改你的函数:

from Bio import SeqIO
import os, glob
folder_path = 'genome_nucseq_unique/data/'
fasta_paths = glob.glob(os.path.join(folder_path, '*.fasta'))
for fasta_path in fasta_paths:
    print(fasta_path)
    for seq_record in SeqIO.parse(fasta_path, "fasta"):
        print(seq_record.id)
        print(seq_record.seq)
        print()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM