简体   繁体   中英

How to extract short sequence based on step size?

The code below extract short sequence in every sequence with the window size 100. The window will shift by step size one and extract the sequence. I would like to extract the short sequence with every step size 50. Can anyone help me?

 from Bio import SeqIO

 with open("B.fasta","w") as f:
         for seq_record in SeqIO.parse("A.fasta", "fasta"):
             for i in range(len(seq_record.seq) - 99) :
                f.write(str(">"+seq_record.id) + "\n")
                f.write(str(seq_record.seq[i:i+100]) + "\n")

Example of fasta file:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG

Example output:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG
>hg17_ct_ER_ER_142
TAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGG
>hg17_ct_ER_ER_142
AAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACAAGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGGGG

Expected output:

>hg17_ct_ER_ER_142
CTAAAAAAGTAAAAAAGAAAAAAAGAGAAAGAAAGAATATAGAAGCAACA
>hg17_ct_ER_ER_142
AGTGTAGATTTACATTCTATTAGACAGTGACCCATTAGACCCGGACAAGG

只需对范围函数使用步长选项:

for i in range(0, len(seq_record.seq) - 99, 50) :

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM