[英]AttributeError: 'list' object has no attribute 'SeqRecord' - while trying to slice multiple sequences with Biopython>SeqIO from fasta file
[英]I want to parse Sequences and sequence Ids from a fasta file and assign them to Dataframe. I am using SeqIO library from biopython
这是我的代码的样子。 假设文件路径是“文件”
seq_object = SeqIO.parse(file, "fasta")
sequences = []
for seq in seq_object:
sequences.append(seq)
first_record = sequences[0]
first_record
输出看起来像这样
SeqRecord(seq=Seq('mfptsiisvlllnalqshaapllpsspstlafvpsvhapssssskssvhttsts...fr*'), id='Thaps3a_25099', name='Thaps3a_25099', description='Thaps3a_25099', dbxrefs=[])
要分配给数据框,我试过这种方式
seq_ids = []
seqs = []
seq_lengths = []
for record in sequences:
seq_id = record.id
sequence = record.seq
length = len(sequence)
seq_ids.append(seq_id)
seqs.append(sequence)
seq_lengths.append(length)
现在在数据框中,我得到了我不想要的逗号分隔序列。 我想要它们简单明了。 有什么建议?
df = pd.DataFrame()
df["Seq_id"]= seq_ids
df["Sequences"] = seqs
df["Sequence_length"] = seq_lengths
数据框看起来像这样
*Seq_id Sequences Sequence_length
0 Thaps3a_25099 (m, f, p, t, s, i, i, s, v, l, l, l, n, a, l, ... 331
1 Thaps3a_10882 (m, v, k, q, i, a, v, a, t, c, m, t, l, a, s, ... 187
2 Thaps3a_255658 (f, g, g, e, g, f, l, l, f, f, l, g, l, g, f, ... 111
3 Thaps3a_21592 (m, k, a, s, i, l, t, a, l, s, i, l, s, v, a, ... 228
4 Thaps3a_261225 (m, l, t, i, l, s, l, l, e, w, m, a, s, r, w, ... 1317
... ... ... ...
13339 Thaps3a_24736 (m, a, e, w, a, s, h, k, t, a, t, n, m, p, p, ... 567
13340 Thaps3a_9764 (m, s, t, h, n, d, f, r, q, g, t, a, y, l, f, ... 981
13341 Thaps3a_3869 (m, p, f, p, f, f, g, f, g, q, s, d, p, a, a, ... 181
13342 Thaps3a_1985 (m, n, s, d, e, q, p, l, v, t, n, d, d, q, d, ... 416
13343 Thaps3a_25099 (m, a, e, d, d, y, h, l, i, s, e, e, p, s, s, ... 445*
只需使用str(record.seq)
:
from Bio import SeqIO
import pandas as pd
file = 'fasta.faa'
seq_object = SeqIO.parse(file, "fasta")
sequences = []
for seq in seq_object:
sequences.append(seq)
first_record = sequences[0]
print(first_record)
seq_ids = []
seqs = []
seq_lengths = []
for record in sequences:
seq_id = record.id
sequence = str(record.seq)
length = len(sequence)
seq_ids.append(seq_id)
seqs.append(sequence)
seq_lengths.append(length)
df = pd.DataFrame()
df["Seq_id"]= seq_ids
df["Sequences"] = seqs
df["Sequence_length"] = seq_lengths
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.