I have list of positions:
chr1 1000
chr2 2000
chr3 4000
and would like to be able to transform those position in their nucleotide sequence giving a custom fasta file. Such as:
chr1 1000 A
chr2 2000 T
chr3 4000 G
Is there any already written tool in python that can do this job?
Given the FASTA file chromosomes.fasta
:
>chr1
GATTACA
>chr2
ATTACGA
>chr3
GCCAACG
And the positions file positions.txt
:
chr1 3
chr2 4
chr3 5
You can use the following code:
from Bio import SeqIO
record_dict = SeqIO.to_dict(SeqIO.parse('chromosomes.fasta', "fasta"))
chromosome_positions = {}
with open('positions.txt') as f:
for line in f.read().splitlines():
if line:
chromosome, position = line.split()
chromosome_positions[chromosome] = int(position)
for chromosome in chromosome_positions:
seq = record_dict[chromosome]
position = chromosome_positions[chromosome]
base = seq[position]
print chromosome, position, base
Which will output:
chr3 5 C
chr2 4 C
chr1 3 T
Note that Python uses zero-based indexing , so position 5
in positions.txt
will give you the sixth base in the corresponding sequence.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.