简体   繁体   中英

How do I use a for loop in a for loop in this script?

I am trying to write a script that creates a list of dictionaries from a file containing protein IDs. This is what I wrote until now:

#import packages
import sys

#get the file from the command line
map_file =  sys.argv[1]


#create dictionaries containing the different proteins IDs
def get_mapping(map_file):
        file = open(map_file)
    result = list() 
    column_count = file.readline().split('\t')  
    n = len(column_count)
    for i in range(n-1):
        result.append({})
    for line in file:
        word = line.split('\t')
        for w in range(n):
            if word[n-1] <> word[0]:
                result[n-2][word[n-1]] = word[0]
            n = n-1         
    return result

print get_mapping(map_file)

So the input file contains many lines and each line contains 2-4 different IDs for a specific protein. I want to create a list of dictionaries that have the first ID of a line as value and one of the other IDs as key. When I run this script It does exactly what I want it to do, but only for the first line in the input file. What do I need to change so that it does this for every line in the input file?

The protein file looks like this:

Ensembl_Protein_ID UniProt/SwissProt_Accession UniProt/TrEMBL_Accession RGD_ID 
ENSRNOP00000000008 P18088 C9E895 2652 
ENSRNOP00000000008 P18088 B3VQJ0 2652 
ENSRNOP00000000009 D3ZEM1 1310201 
ENSRNOP00000000025 B4F7C7 
ENSRNOP00000000029 Q9ES39 620038 
ENSRNOP00000000037 Q7TQM3 735156 
ENSRNOP00000000052 O70352 Q6IN14 69070 
ENSRNOP00000000053 Q9JLM2 68400 
ENSRNOP00000000064 P97874 621589 
ENSRNOP00000000072 P29419 621377 
ENSRNOP00000000074 B2RZ28 1304584 
ENSRNOP00000000078 D3ZDI7 1308022 
ENSRNOP00000000080 Q5XI68 1305201 
ENSRNOP00000000085 D3ZDH7

You decrese n in your inner for loop but do not reset it to it's original value. Just add n = len(column_count) either before or after your for w in range(n): loop and it should work. Or even better, use the w variable directly, instead of decreasing n :

for w in range(1, len(word)):
    if word[w] <> word[0]:
        result[w-1][word[w]] = word[0]

Also, note that column_count = file.readline().split('\\t') could be a problem: First, judging from your question it is not clear whether the first line will hold the maximum number of words per line; second, note that this line will not be read again in your second for line in file loop, so unless this is some sort of header line, some IDs will be lost. Update: It is a header, listing all the columns, so this is perfectly okay.

Finally, you should take care of closing the file (add file.close() at the end of your method), or use the with statement , which will take care of this for you. At the end of this block, the file will automatically be properly closed.

with open(map_file) as f:
    # your code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM