[英]Splitting input code into 3 letters and using if statements to return DNA->Amino Acid Letter
它正在按預期方式讀取前3個字母,但如何獲取它以不同的字符串讀取接下來的3個字母,以便它可以再次運行if,elif和else語句。
else X在那里,是因為它不需要我其余的人。
該表的存在僅是因為它對查看其是否返回正確的字母很有用。
dna = input("Enter the DNA sequence to translate: ")
if dna == "ATA" or dna == "ATC" or dna == "ATT":
print ("I")
elif dna == "CTA" or dna == "CTC" or dna == "CTG" or dna == "CTT" or dna == "TAA" or dna =="TTG":
print ("L")
elif dna == "GTA" or dna == "GTC" or dna == "GTG" or dna == "GTT":
print ("V")
elif dna == "TTC" or dna == "TTT":
print ("F")
elif dna == "ATG":
print ("M")
else:
print ("X")
CodonDict = {
'ATT':'I', 'ATC':'I', 'ATA':'I', 'CTT':'L', 'CTC':'L',
'CTA':'L', 'CTG':'L', 'TTA':'L', 'TTG':'L', 'GTT':'V', 'GTC':'V',
'GTA':'V', 'GTG':'V', 'TTT':'F', 'TTC':'F', 'ATG':'M', 'TGT':'C',
'TGC':'C', 'GCT':'A', 'GCC':'A', 'GCA':'A', 'GCG':'A', 'GGT':'G',
'GGC':'G', 'GGA':'G', 'GGG':'G', 'CCT':'P', 'CCC':'P', 'CCA':'P',
'CCG':'P', 'ACT':'T', 'ACC':'T', 'ACA':'T', 'ACG':'T', 'TCT':'S',
'TCC':'S', 'TCA':'S', 'TCG':'S', 'AGT':'S', 'AGC':'S', 'TAT':'Y',
'TAC':'Y', 'TGG':'W', 'CAA':'Q', 'CAG':'Q', 'AAT':'N', 'AAC':'N',
'CAT':'H', 'CAC':'H', 'GAA':'E', 'GAG':'E', 'GAT':'D', 'GAC':'D',
'AAA':'K', 'AAG':'K', 'CGT':'R', 'CGC':'R', 'CGA':'R', 'CGG':'R',
'AGA':'R', 'AGG':'R', 'TAA':'X', 'TAG':'X', 'TGA':'X'}
For example I input: ATT
Returns I as expected.
I input: ATTATT
Returns X as expected but how do I treat the 3 letters as separate.
It should return II.
遍歷字符串-
for i in range(int(len(dna)/3)):
tmp = dna[i*3: (i+1)*3]
if tmp == "ATA" or tmp == "ATC" or tmp == "ATT":
print ("I")
elif tmp == "CTA" or tmp == "CTC" or tmp == "CTG" or tmp == "CTT" or tmp == "TAA" or tmp =="TTG":
print ("L")
elif tmp == "GTA" or tmp == "GTC" or tmp == "GTG" or tmp == "GTT":
print ("V")
elif tmp == "TTC" or tmp == "TTT":
print ("F")
elif tmp == "ATG":
print ("M")
else:
print ("X")
您可以使用帶有for循環和step參數range的滑動窗口,在其中您一次遍歷DNA序列最多3個字符:
dna = input("Enter the DNA sequence to translate: ")
amino_str = ''
for i in range(0, len(dna), 3):
dna_part = dna[i:i+3]
if dna_part == "ATA" or dna_part == "ATC" or dna_part == "ATT":
amino_str += "I"
elif dna_part == "CTA" or dna_part == "CTC" or dna_part == "CTG" or dna_part == "CTT" or dna_part == "TAA" or dna_part == "TTG":
amino_str += "L"
elif dna_part == "GTA" or dna_part == "GTC" or dna_part == "GTG" or dna_part == "GTT":
amino_str += "V"
elif dna_part == "TTC" or dna_part == "TTT":
amino_str += "F"
elif dna_part == "ATG":
amino_str += "M"
else:
amino_str += "X"
print(amino_str)
用法示例1:
Enter the DNA sequence to translate: ATAATA
II
您可以通過使用CodonDict
來簡化上述代碼:
amino_str = ''
for i in range(0, len(dna), 3):
dna_part = dna[i:i+3]
if CodonDict[dna_part] != None:
amino_str += CodonDict[dna_part]
else:
amino_str = "ERROR: PARSING DNA SEQUENCE"
break
print(amino_str)
用法示例2:
Enter the DNA sequence to translate: AGAATACGC
RIR
例:
input = 'ATTATTTTAGGG'
for i in range(0,len(intput),3):
print (input[i:i+3])
輸出:
ATT
ATT
TTA
GGG
訪問字符串中的值 Python不支持字符類型。 這些被視為長度為一的字符串,因此也被視為子字符串。
要訪問子字符串,請使用方括號和一個或多個索引進行切片以獲得子字符串。 例如-
var1 = 'Hello World!' var2 = "Python Programming" print ("var1[0]: ", var1[0]) print ("var2[1:5]: ", var2[1:5])
輸出:
var1[0]: H var2[1:5]: ytho
解決方案1:
dna_ = input("Enter the DNA sequence to translate: ")
for i in range(0,len(dna_),3):
#print (dna_[i:i+3])
dna = dna_[i:i+3]
if dna == "ATA" or dna == "ATC" or dna == "ATT":
print ("I")
elif dna == "CTA" or dna == "CTC" or dna == "CTG" or dna == "CTT" or dna == "TAA" or dna =="TTG":
print ("L")
elif dna == "GTA" or dna == "GTC" or dna == "GTG" or dna == "GTT":
print ("V")
elif dna == "TTC" or dna == "TTT":
print ("F")
elif dna == "ATG":
print ("M")
else:
print ("X")
solution2 :(如果您想使用您擁有的字典)
dna_ = input("Enter the DNA sequence to translate: ")
for i in range(0,len(dna_),3):
#print (dna_[i:i+3])
dna = dna_[i:i+3]
for key, val in CodonDict.items():
if key == dna:
print (val)
輸出:
Enter the DNA sequence to translate: TGCCTG
C
L
我將把您的CodonDict
更改為更緊湊的結構,一個元組列表,其中元組的第一個元素是DNA三聯體的列表,而預期的氨基酸字母中的第二個元素如下。
codon_list = [ (["ATA","ATC" ,"ATT"],"I") ,
(["CTA" , "CTC" , "CTG", "CTT", "TAA", "TTG"], "L"),
(["GTA" , "GTC" , "GTG", "GTT"] , "V"),
(["TTC", "TTT"], "F"),
(["ATG"], "M")
]
我假設如果DNA序列長度不是3的倍數,我正在退出代碼,不確定這是否是必需的行為。
import sys
result = ''
dna_str = input("Enter the DNA sequence to translate!")
#Get the number of Amino Acids length
dna_len = int(len(dna_str)/3)
idx = 0
#If the DNA sequence is not a multiple of 3, exit the code!
if len(dna_str)%3 != 0:
print("DNA sequence is not a multiple of 3! Exiting")
sys.exit()
codon_list = [ (["ATA","ATC" ,"ATT"],"I") ,
(["CTA" , "CTC" , "CTG", "CTT", "TAA", "TTG"], "L"),
(["GTA" , "GTC" , "GTG", "GTT"] , "V"),
(["TTC", "TTT"], "F"),
(["ATG"], "M")
]
#Iterate through all DNA sequence triplets
while idx < len(dna_str):
#Get the DNA triplet
dna = dna_str[idx:idx+3]
#Get the amino acid
amino_acid = [t[1] for t in codon_list if dna in t[0]]
#If amino acid is not present, default to X, else get amino acid
if not amino_acid:
amino_acid = 'X'
else:
amino_acid = amino_acid[0]
#Append to final result
result += amino_acid
#Increment the index
idx+=3
print(result)
可能的輸出是:
Enter the DNA sequence to translate!ATT
I
Enter the DNA sequence to translate!ATTATT
II
Enter the DNA sequence to translate!AX
DNA sequence is not a multiple of 3! Exiting
您擁有CodonDict,應該使用它:
dna = input("Enter the DNA sequence to translate: ").strip().upper()
rslt= [ CodonDict.get(dna[i:i+3],"?") for i in range(0,len(dna),3) ]
print(rslt,"\n", "".join(rslt))
如果模式不匹配,則產生問號。 或者您可以聲明一個函數:
def triplet(d3):
if d3 in ("ATA","ATC", "ATT"):
return "I"
if d3 in ("CTA","CTC","CTG","CTT","TAA","TTG"):
return "L"
if d3 in ( "GTA","GTC","GTG","GTT"):
return "V"
if d3 in ("TTC","TTT"):
return "F"
if d3 == "ATG":
return "M"
return "X"
r=""
for i in range(0,len(dna),3):
r+=triplet(dna[i:i+3])
print(r)
對於長輸入字符串,請考慮使用itertools配方中的grouper函數。
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
處理示例:
>>> input_ = 'ATTATTTAA'
>>> groups = grouper(input_, n=3)
>>> for g in groups:
... print(''.join(g))
...
ATT
ATT
TAA
您的代碼可能看起來像:
groups = grouper(dna, n=3)
results = []
for g in groups:
code = ''.join(g)
results.append(CodonDict[code])
print(''.join(results))
如果可能的話,輸入包含的值不在CodonDict
你應該指定一個字符串fillvalue
參數grouper
(可能是小寫字母),以便join
不會失敗,並處理的查找中丟失的鑰匙,結果CodonDict
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.