简体   繁体   English

如何比较列表中的元素并比较Python列表中的键?

[英]How to compare elements in a list of lists and compare keys in a list of lists in Python?

I have the following sequence: 我有以下顺序:

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like ATG, GCT etc). 这是一个字典键,用于存储每个密码子(三联体碱基如ATG, GCT等)的氨基酸值。

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCC' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','GGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G'}

As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G) ). 可以看出,几个密码子可以编码相同的氨基酸(例如GGT,GGC,GGA, GGG etc all code for Glycine (G) )。 These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn) 这些是同义词(PSyn),如果密码子编码不同的氨基酸,它们是非同义词(PNonsyn)

In this code, I need to do the following: 在此代码中,我需要执行以下操作:

  1. For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1 对于列表列表中的每个元素,如果碱基发生变化并且它们都编码相同的氨基酸,则将PSyn的计数增加1,如果它编码不同的氨基酸,则增加计数PNonsyn为1

    Here, 这里,

     ATG all code for M #However, all are ATG's no change in bases. So no increment in count GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1 GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1 

    OutPut: CountPsyn = 1 CountPNonsyn = 1 OutPut: CountPsyn = 1 CountPNonsyn = 1

  2. Generate a list of amino acids that corresponds to the above seq. 生成与上述序列对应的氨基酸列表。 such that: 这样:

    Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases

I need help modifying the following code to get the program to work. 我需要帮助修改以下代码以使程序工作。 I am not confident on how to call values from dictionary and check them against all the elements. 我对如何从字典中调用值并对所有元素进行检查没有信心。 Code Attempted: 代码尝试:

countPsyn = 0
countPnonsyn = 0
listofaa =[]

for i in seq:
    for base, value in enumerate(i):        
        if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG'] 
            listofaa.append(value)

        if value[i] != value[i+1]: 
            if aminoacid[value][i] ==  aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
                countPsyn =+ 1
                listofaa.append(aminoacid)
            else: #eg. ['GAC','GAT','GAA','CCT']
                countPnonsyn =+ 1
                listofaa.append('nonsyn')

File Output can be found [here][1] https://eval.in/669107

Here is my stab at the solution. 这是我对解决方案的抨击。

aminoacid = {'GCC': 'A' ,'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
output = [];

#loop through each list in your list of list
for sublist in seq:
    acids = [aminoacid[base] for base in sublist]
    if len(set(acids)) != 1: #if there are different amino acids, then nonsync
        output.append('nonsync')
        PNonsyn += 1
    else: #if same amino acid
        if len(set(sublist)) == 1: #if same base
            output.append(sublist[0]);
        else: #if not same base
            output.append(acids[0]);
            Psyn += 1

print "Psyn = "+ str(Psyn)
print "PNonsyn = "+ str(PNonsyn)
print output

Admittedly it's not a modification of your code, but there is a neat trick here to void the double for loop. 不可否认,这不是对代码的修改,但是这里有一个巧妙的技巧来消除double for循环。 Given a list mylist , you could find all uniques elements in a list by calling set(mylist) . 给定列表mylist ,您可以通过调用set(mylist)找到列表中的所有唯一元素。 Eg 例如

>>> a = ['AGT','AGT','ACG']
>>> set(a)
set(['AGT', 'ACG'])
>>> len(set(a))
2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM