简体   繁体   中英

Match nucleotide position to sequence from fasta file

I have list of positions:

chr1 1000
chr2 2000
chr3 4000

and would like to be able to transform those position in their nucleotide sequence giving a custom fasta file. Such as:

chr1 1000 A
chr2 2000 T
chr3 4000 G

Is there any already written tool in python that can do this job?

Given the FASTA file chromosomes.fasta :

>chr1
GATTACA
>chr2
ATTACGA
>chr3
GCCAACG

And the positions file positions.txt :

chr1 3

chr2 4

chr3 5

You can use the following code:

from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('chromosomes.fasta', "fasta"))

chromosome_positions = {}
with open('positions.txt') as f:
    for line in f.read().splitlines():
        if line:
            chromosome, position = line.split()
            chromosome_positions[chromosome] = int(position)


for chromosome in chromosome_positions:
    seq = record_dict[chromosome]
    position = chromosome_positions[chromosome]
    base = seq[position]
    print chromosome, position, base

Which will output:

chr3 5 C
chr2 4 C
chr1 3 T

Note that Python uses zero-based indexing , so position 5 in positions.txt will give you the sixth base in the corresponding sequence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM