简体   繁体   中英

look up dictionary in python

So I have file that with multiple line that look like this (space delimiter file):

A1BG      P04217     VAR_018369  p.His52Arg     Polymorphism  rs893184    -
A1BG      P04217     VAR_018370  p.His395Arg    Polymorphism  rs2241788   -
AAAS      Q9NRG9     VAR_012804  p.Gln15Lys     Disease       -           Achalasia

How do I make dictionary to look for id in second column and store the number (between words) on fourth column.

I tried this but it give me index of out range

lookup = defaultdict(list)
with open ('humsavar.txt', 'r') as humsavarTxt:
    for line in csv.reader(humsavarTxt):
        code = re.match('[a-z](\d+)[a-z]', line[1], re.I)
        if code: 
            lookup[line[-2]].append(code.group(1))

print lookup['P04217']

Here's a variant of the original code:

import csv, re
from collections import defaultdict

lookup = defaultdict(list)
with open('humsavar.txt', 'rb') as humsavarTxt:
    reader = csv.reader(humsavarTxt, delimiter=" ", skipinitialspace=True)
    for line in reader:
        code = re.search(r'(\d+)', line[3])
        lookup[line[1]].append(int(code.group(1)))

which produces

>>> lookup
defaultdict(<type 'list'>, {'P04217': [52, 395], 'Q9NRG9': [15]})
>>> lookup['P04217']
[52, 395]

If the id and the number is always in the second and fourth column, and it's always space delimited you don't need to use regular expresion. You can split on the spaces instead:

lookup = defaultdict(list)
with open ('humsavar.txt', 'r') as humsavarTxt:
    for line in humsavarTxt:
         lookup[line.split(' ')[1]].append(line.split(' ')[3])

If you want a pure dictionary, this works:

d={}
with open(your_file,'rb') as f:
    for line in f:
        l=line.split()
        num=int(re.search(r'(\d+)',l[3]).group(1))
        d.setdefault(l[1],[]).append(num)

Prints:

{'P04217': [52, 395], 'Q9NRG9': [15]}

For a non regex solution, you can also do this:

d={}
with open(your_file,'rb') as f:
    for line in f:
        els=line.split()
        num=int(''.join(c for c in els[3] if c.isdigit()))
        d.setdefault(els[1],[]).append(num)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM