Match nucleotide position to sequence from fasta file

Question

I have list of positions:

chr1 1000
chr2 2000
chr3 4000

and would like to be able to transform those position in their nucleotide sequence giving a custom fasta file. Such as:

chr1 1000 A
chr2 2000 T
chr3 4000 G

Is there any already written tool in python that can do this job?

Answer 1

Given the FASTA file chromosomes.fasta :

>chr1
GATTACA
>chr2
ATTACGA
>chr3
GCCAACG

And the positions file positions.txt :

chr1 3

chr2 4

chr3 5

You can use the following code:

from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('chromosomes.fasta', "fasta"))

chromosome_positions = {}
with open('positions.txt') as f:
    for line in f.read().splitlines():
        if line:
            chromosome, position = line.split()
            chromosome_positions[chromosome] = int(position)


for chromosome in chromosome_positions:
    seq = record_dict[chromosome]
    position = chromosome_positions[chromosome]
    base = seq[position]
    print chromosome, position, base

Which will output:

chr3 5 C
chr2 4 C
chr1 3 T

Note that Python uses zero-based indexing , so position 5 in positions.txt will give you the sixth base in the corresponding sequence.

Match nucleotide position to sequence from fasta file

Question

1 answers

solution1
3 ACCPTED 2017-08-21 10:13:12

Match nucleotide position to sequence from fasta file

Question

1 answers

solution1 3 ACCPTED 2017-08-21 10:13:12

solution1
3 ACCPTED 2017-08-21 10:13:12