簡體   English   中英

將輸入代碼分成3個字母,並使用if語句返回DNA->氨基酸字母

[英]Splitting input code into 3 letters and using if statements to return DNA->Amino Acid Letter

它正在按預期方式讀取前3個字母,但如何獲取它以不同的字符串讀取接下來的3個字母,以便它可以再次運行if,elif和else語句。

else X在那里,是因為它不需要我其余的人。

該表的存在僅是因為它對查看其是否返回正確的字母很有用。

dna = input("Enter the DNA sequence to translate: ")

if dna == "ATA" or dna == "ATC" or dna == "ATT":
  print ("I")
elif dna == "CTA" or dna == "CTC" or dna == "CTG" or dna == "CTT" or dna == "TAA" or dna =="TTG":
  print ("L")
elif dna == "GTA" or dna == "GTC" or dna == "GTG" or dna == "GTT":
  print ("V")
elif dna == "TTC" or dna == "TTT":
  print ("F")
elif dna == "ATG":
  print ("M")
else:
  print ("X")


CodonDict = {
'ATT':'I',   'ATC':'I',  'ATA':'I',  'CTT':'L',  'CTC':'L',  
'CTA':'L',  'CTG':'L',  'TTA':'L',  'TTG':'L',  'GTT':'V',  'GTC':'V',  
'GTA':'V',  'GTG':'V',  'TTT':'F',  'TTC':'F',  'ATG':'M',  'TGT':'C',  
'TGC':'C',  'GCT':'A',  'GCC':'A',  'GCA':'A',  'GCG':'A',  'GGT':'G',  
'GGC':'G',  'GGA':'G',  'GGG':'G',  'CCT':'P',  'CCC':'P',  'CCA':'P',  
'CCG':'P',  'ACT':'T',  'ACC':'T',  'ACA':'T',  'ACG':'T',  'TCT':'S',  
'TCC':'S',  'TCA':'S',  'TCG':'S',  'AGT':'S',  'AGC':'S',  'TAT':'Y',  
'TAC':'Y',  'TGG':'W',  'CAA':'Q',  'CAG':'Q',  'AAT':'N',  'AAC':'N',  
'CAT':'H',  'CAC':'H',  'GAA':'E',  'GAG':'E',  'GAT':'D',  'GAC':'D',  
'AAA':'K',  'AAG':'K',  'CGT':'R',  'CGC':'R',  'CGA':'R',  'CGG':'R',  
'AGA':'R',  'AGG':'R',  'TAA':'X',  'TAG':'X',  'TGA':'X'} 

For example I input: ATT
Returns I as expected.

I input: ATTATT
Returns X as expected but how do I treat the 3 letters as separate.
It should return II.

遍歷字符串-

for i in range(int(len(dna)/3)):
    tmp = dna[i*3: (i+1)*3]
    if tmp == "ATA" or tmp == "ATC" or tmp == "ATT":
        print ("I")
    elif tmp == "CTA" or tmp == "CTC" or tmp == "CTG" or tmp == "CTT" or tmp == "TAA" or tmp =="TTG":
        print ("L")
    elif tmp == "GTA" or tmp == "GTC" or tmp == "GTG" or tmp == "GTT":
        print ("V")
    elif tmp == "TTC" or tmp == "TTT":
        print ("F")
    elif tmp == "ATG":
        print ("M")
    else:
        print ("X")

您可以使用帶有for循環和step參數range的滑動窗口,在其中您一次遍歷DNA序列最多3個字符:

dna = input("Enter the DNA sequence to translate: ")

amino_str = ''
for i in range(0, len(dna), 3):
  dna_part = dna[i:i+3]
  if dna_part == "ATA" or dna_part == "ATC" or dna_part == "ATT":
    amino_str += "I"
  elif dna_part == "CTA" or dna_part == "CTC" or dna_part == "CTG" or dna_part == "CTT" or dna_part == "TAA" or dna_part == "TTG":
    amino_str += "L"
  elif dna_part == "GTA" or dna_part == "GTC" or dna_part == "GTG" or dna_part == "GTT":
    amino_str += "V"
  elif dna_part == "TTC" or dna_part == "TTT":
    amino_str += "F"
  elif dna_part == "ATG":
    amino_str += "M"
  else:
    amino_str += "X"
print(amino_str)

用法示例1:

Enter the DNA sequence to translate: ATAATA
II

您可以通過使用CodonDict來簡化上述代碼:

amino_str = ''
for i in range(0, len(dna), 3):
  dna_part = dna[i:i+3]
  if CodonDict[dna_part] != None:
    amino_str += CodonDict[dna_part]
  else:
    amino_str = "ERROR: PARSING DNA SEQUENCE"
    break
print(amino_str)

用法示例2:

Enter the DNA sequence to translate: AGAATACGC
RIR

例:

input = 'ATTATTTTAGGG'
for i in range(0,len(intput),3):
    print (input[i:i+3]) 

輸出:

ATT
ATT
TTA
GGG

訪問字符串中的值 Python不支持字符類型。 這些被視為長度為一的字符串,因此也被視為子字符串。

要訪問子字符串,請使用方括號和一個或多個索引進行切片以獲得子字符串。 例如-

 var1 = 'Hello World!' var2 = "Python Programming" print ("var1[0]: ", var1[0]) print ("var2[1:5]: ", var2[1:5]) 

輸出:

 var1[0]: H var2[1:5]: ytho 

解決方案1:

dna_ = input("Enter the DNA sequence to translate: ")
for i in range(0,len(dna_),3):
    #print (dna_[i:i+3])
    dna = dna_[i:i+3]
    if dna == "ATA" or dna == "ATC" or dna == "ATT":
      print ("I")
    elif dna == "CTA" or dna == "CTC" or dna == "CTG" or dna == "CTT" or dna == "TAA" or dna =="TTG":
      print ("L")
    elif dna == "GTA" or dna == "GTC" or dna == "GTG" or dna == "GTT":
      print ("V")
    elif dna == "TTC" or dna == "TTT":
      print ("F")
    elif dna == "ATG":
      print ("M")
    else:
      print ("X")

solution2 :(如果您想使用您擁有的字典)

dna_ = input("Enter the DNA sequence to translate: ")
for i in range(0,len(dna_),3):
    #print (dna_[i:i+3])
    dna = dna_[i:i+3]
    for key, val in CodonDict.items():
        if key == dna:
            print (val)

輸出:

Enter the DNA sequence to translate: TGCCTG
C
L

我將把您的CodonDict更改為更緊湊的結構,一個元組列表,其中元組的第一個元素是DNA三聯體的列表,而預期的氨基酸字母中的第二個元素如下。

codon_list = [ (["ATA","ATC" ,"ATT"],"I") ,
                (["CTA" , "CTC" , "CTG", "CTT", "TAA", "TTG"], "L"),
                (["GTA" , "GTC" , "GTG",  "GTT"] , "V"),
                (["TTC", "TTT"], "F"),
                (["ATG"], "M")
                ]

我假設如果DNA序列長度不是3的倍數,我正在退出代碼,不確定這是否是必需的行為。

import sys

result = ''
dna_str = input("Enter the DNA sequence to translate!")

#Get the number of Amino Acids length
dna_len = int(len(dna_str)/3)
idx = 0

#If the DNA sequence is not a multiple of 3, exit the code!
if len(dna_str)%3 != 0:
    print("DNA sequence is not a multiple of 3! Exiting")
    sys.exit()

codon_list = [ (["ATA","ATC" ,"ATT"],"I") ,
                (["CTA" , "CTC" , "CTG", "CTT", "TAA", "TTG"], "L"),
                (["GTA" , "GTC" , "GTG",  "GTT"] , "V"),
                (["TTC", "TTT"], "F"),
                (["ATG"], "M")
                ]

#Iterate through all DNA sequence triplets
while idx < len(dna_str):

    #Get the DNA triplet
    dna = dna_str[idx:idx+3]

    #Get the amino acid
    amino_acid = [t[1] for t in codon_list if dna in t[0]]

    #If amino acid is not present, default to X, else get amino acid
    if not amino_acid:
        amino_acid = 'X'
    else:
        amino_acid = amino_acid[0]

    #Append to final result
    result += amino_acid

    #Increment the index
    idx+=3

print(result)

可能的輸出是:

Enter the DNA sequence to translate!ATT
I

Enter the DNA sequence to translate!ATTATT
II

Enter the DNA sequence to translate!AX
DNA sequence is not a multiple of 3! Exiting

您擁有CodonDict,應該使用它:

dna = input("Enter the DNA sequence to translate: ").strip().upper()
rslt= [ CodonDict.get(dna[i:i+3],"?") for i in range(0,len(dna),3) ]
print(rslt,"\n", "".join(rslt))

如果模式不匹配,則產生問號。 或者您可以聲明一個函數:

def triplet(d3):

    if d3 in ("ATA","ATC", "ATT"):
        return "I"
    if d3 in ("CTA","CTC","CTG","CTT","TAA","TTG"):
        return "L"
    if d3 in ( "GTA","GTC","GTG","GTT"):
        return "V"
    if d3 in ("TTC","TTT"):
        return "F"
    if d3 == "ATG":
        return "M"

    return "X"

r=""
for i in range(0,len(dna),3):

    r+=triplet(dna[i:i+3])

print(r)  

對於長輸入字符串,請考慮使用itertools配方中grouper函數。

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

處理示例:

>>> input_ = 'ATTATTTAA'
>>> groups = grouper(input_, n=3)
>>> for g in groups:
...     print(''.join(g))
... 
ATT
ATT
TAA

您的代碼可能看起來像:

groups = grouper(dna, n=3)
results = []
for g in groups:
    code = ''.join(g)
    results.append(CodonDict[code])
print(''.join(results))

如果可能的話,輸入包含的值不在CodonDict你應該指定一個字符串fillvalue參數grouper (可能是小寫字母),以便join不會失敗,並處理的查找中丟失的鑰匙,結果CodonDict

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM