简体   繁体   中英

Python: Create a Dictionary with key and values from different input

I have a script that is looking on different webtext URL based on a list of ID. I want to extract a specific information on each of these text pages (the string after \\sC:). So far I am able to store these strings output in a list, however it is not the best to keep track of the correspondence between the original ID searched and the resulting output. I would like to make a dictionary instead of a list with the ID searched as a key and the corresponding output as values.

Here is my script so far:

import urllib2
import sys
import re

IDlist = ['C9JVZ1', 'C9JLN0', 'C9J872']

URLlist = ["http://www.uniprot.org/uniprot/"+x+".txt" for x in IDlist]
function_list = []
for item in URLlist:
    textfile = urllib2.urlopen(item)
    myfile = textfile.readlines();
    for line in myfile:
        print "line:", line;
        found = re.search('\s[C]:(.+?);', line);
        if found:
            function = found.group(1);
            function_list.append(function)

The output I get is:

['cytosol', 'nucleus', 'transcription factor complex']

where nothing is found in http://www.uniprot.org/uniprot/C9JVZ1.txt

Cytosol is found in http://www.uniprot.org/uniprot/C9JLN0.txt

and nucleus , transcription factor complex found in http://www.uniprot.org/uniprot/C9J872.txt

The output I'm looking for is something like:

{'C9JVZ1':[], 'C9JLN0':['cytosol'], 'C9J872':['nucleus', 'transcription factor complex']}

I have tried:

if found:
            function = found.group(1);
            function_dic = {item:[function]}

but I get this output:

>>> function_dic
{'http://www.uniprot.org/uniprot/C9J872.txt': ['transcription factor complex']}
function_dic = {item:[function]}

This overwrites the dictionary every loop with just the last entry, to fix that you would do:

function_dic[item] = [function]

But this would just overwrite the value each time, so you would only have one function, so to fix this you would do:

function_dic[item].append(function)

However you would need to initialise this outside of the inner for loop:

function_dic[item] = []

You mentioned you want ID not URL so you can change your outer for loop to include the ID and use that to construct your dictionary, so putting it all together:

function_list = {}
for id, item in zip(IDlist, URLlist):
    function_list[id] = []
    ...
    for line in myfile:
        ...
        if found:
            function = found.group(1)
            function_list[id].append(function)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM