I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python , I have "half completed" the process.
So far..
Biopython
package. EMBOSS Needle
program. I wish to..
Question
I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.
You can make a mapping of peptides to nucleotides with the addition of your missing character:
codons = str.maketrans({'M' : 'ATG',
'R' : 'CGT',
...,
'-' : '---'}) # Your missing character
peptide = 'M-R'
result = peptide.translate(codons)
and then translate the full sequence.
All you need to do is split the nucleotide sequence into triplets. Each amino-acid is a triplet, each gap is three gaps. so in pseudo code:
for x in range(0, len(aminoacid)):
if x != "-":
print nucleotide[3x:3x+3]
else:
print "---"
I understand you've asked this question three years ago, but this post is the first thing I find with my google search 'codon alignment python'. Therefore, I wanted to respond to this for everyone that might stumble upon this still looking for a library to do this.
You can use the library PyCogent for this.
They explain it well on their website: http://pycogent.org/examples/align_codons_to_protein.html
In the end I made my own Python function, thought I may as well share it.
It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence :
Function
def gapsFromPeptide( peptide_seq, nucleotide_seq ):
""" Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment)
- peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
- nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
def chunks(l, n):
""" Yield successive n-sized chunks from l."""
for i in xrange(0, len(l), n):
yield l[i:i+n]
codons = [codon for codon in chunks(nucleotide_seq,3)] #splits nucleotides into codons (triplets)
gappedCodons = []
codonCount = 0
for aa in peptide_seq: #adds '---' gaps to nucleotide seq corresponding to peptide
if aa!='-':
gappedCodons.append(codons[codonCount])
codonCount += 1
else:
gappedCodons.append('---')
return(''.join(gappedCodons))
Usage
>>> unaligned_dna_seq = 'ATGATGATG'
>>> aligned_peptide_seq = 'M-MM'
>>> aligned_dna_seq = gapsFromPeptide(aligned_peptide_seq, unaligned_dna_seq)
>>> print(aligned_dna_seq)
ATG---ATGATG
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.