Code:
file =open(files, 'r')
seqs = []
title = []
f = file.readlines()
for line in f:
if line[0] == ('>'):
title.append(line[1:-1])
if line[0] != '>':
seqs.append(line.rstrip())
final = []
for t, s in zip(title, seqs):
final.append([t, s])
return final
I want to pair those multiple lines described in the output.
But I'm getting output which is not aligned since sequence can occupy multiple lines.
you could read each line if it starts with a >
then you know its a new list to append a new list to the output list with this line data and an empty string to contain the next lines.
data = """>A18178 1
caccaataaaaaaacaagcttaacctaattc
>A21196 1
cggccagatcta
>A21197 1
agcttagatctggccgggg
>AX557348 1
gcggatttactcaggggagagcccagataaatggagtctgtgcgtccaca
gaattcgcacca
>AX557349 1
gcggatttactcaggggagagcccagataaatggagtctgtgcgtccaca
gaattcgcacca
>AX557350 1
tccgtgaaacaaagcggatgtaccggatttttattccggctatggggcaa
ttccccgtcgcggagcca"""
output = []
for line in data.splitlines():
if line.startswith('>'):
output.append([line[1:], ''])
else:
output[-1][-1] += line
print(output)
OUTPUT
[
['A18178 1', 'caccaataaaaaaacaagcttaacctaattc'],
['A21196 1', 'cggccagatcta'],
['A21197 1', 'agcttagatctggccgggg'],
['AX557348 1', 'gcggatttactcaggggagagcccagataaatggagtctgtgcgtccacagaattcgcacca'],
['AX557349 1', 'gcggatttactcaggggagagcccagataaatggagtctgtgcgtccacagaattcgcacca'],
['AX557350 1', 'tccgtgaaacaaagcggatgtaccggatttttattccggctatggggcaattccccgtcgcggagcca']
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.