简体   繁体   中英

Converting string into dictionary - pythonic Way

Experts,

I have written a program to convert the string into dictionary. I'm able to achieve the desired result but i doubt if this is a pythonic way. Would like to hear suggestions on the same.

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''

I have split using (:) and have stored in dictionary. Here Cities and HeadQuarters contains another dictionary for which i have written code like this.

if k == 'cities' : 
    D[k] = {}
    continue
elif k == 'HeadQuarters':
    D[k] = {}
    continue
elif k ==  'LA' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
elif k ==  'NY' :
    if D.has_key('cities'):
        if D['cities'].get(k) is None:
            D['cities'][k] = v
    if D.has_key('HeadQuarters'):
        if D['HeadQuarters'].get(k) is None:
            D['HeadQuarters'][k] = v
else: 
    D[k]= v 

Not sure if pythonic

x = re.split(r':|\n',txt)[1:-1]
x = list(map(lambda x: x.rstrip(),x))
x = (zip(x[::2], x[1::2]))
d = {}
for i in range(len(x)):
    if not x[i][0].startswith('    '):
        if x[i][1] != '':
            d[x[i][0]] = x[i][1]
        else:
            t = x[i][0]
            tmp = {}
            i+=1
            while x[i][0].startswith('    '):
                tmp[x[i][0].strip()] = x[i][1]
                i+=1
            d[t] = tmp
print d

output

{'Country': ' USA', 'cities': {'NY': ' New York', 'LA': ' Los Angeles'}, 'name': ' xxxx', 'desgination': ' yyyy', 'HeadQuarters': {'NY': '  NY', 'LA': '  LA'}}

You can use the split method here, a little recursion for your sub-dictionaries, and an assumption that your sub-dictionaries start with a tab ( \\t ) or four spaces:

def txt_to_dict(txt):
    data = {}
    lines = txt.split('\n')
    i = 0
    while i < len(lines):
        try:
            key,val = txt.split(':')
        except ValueError:
            # print "Invalid row format"
            i += 1
            continue
        key = key.strip()
        val = val.strip()
        if len(val) == 0:
            i += 1
            sub = ""
            while lines[i].startswith('\t') or lines[i].startswith('    '):
                  sub += lines[i] + '\n'
                  i += 1
            data[key] = txt_to_dict(sub[:-1])  # remove last newline character
        else:
            data[key] = val
            i += 1
    return data

And then you would just call it on your variable txt as:

>>> print txt_to_dict(txt)
{'Country': 'USA', 'cities': {'NY': 'New York', 'LA': 'Los Angeles'}, 'name': 'xxxx', 'desgination': 'yyyy', 'HeadQuarters': {'NY': 'NY', 'LA': 'LA'}}

Sample output shown above. Creates the sub-dictionaries properly.

Added some error handling.

This produces the same output as your code. It was arrived at primarily by refactoring what you had and applying a few common Python idioms.

txt = '''
name         : xxxx
desgination  : yyyy
cities       :
    LA       : Los Angeles
    NY       : New York
HeadQuarters :
    LA       :  LA
    NY       :  NY
Country      : USA
'''

D = {}                                                    # added to test code
for line in (line for line in txt.splitlines() if line):  #        "
    k, _, v = [s.strip() for s in line.partition(':')]    #        "

    if k in {'cities', 'HeadQuarters'}:
        D[k] = {}
        continue
    elif k in {'LA', 'NY'}:
        for k2 in (x for x in ('cities', 'HeadQuarters') if x in D):
            if k not in D[k2]:
                D[k2][k] = v
    else:
        D[k]= v

import pprint
pprint.pprint(D)

Output:

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

You could use an existing yaml parser ( PyYAML package ):

import yaml # $ pip install pyyaml

data = yaml.safe_load(txt)

Result

{'Country': 'USA',
 'HeadQuarters': {'LA': 'LA', 'NY': 'NY'},
 'cities': {'LA': 'Los Angeles', 'NY': 'New York'},
 'desgination': 'yyyy',
 'name': 'xxxx'}

The parser accepts your input as is but to make it more conformant yaml , it requires small modifications :

--- 
Country: USA
HeadQuarters: 
  LA: LA
  NY: NY
cities: 
  LA: "Los Angeles"
  NY: "New York"
desgination: yyyy
name: xxxx

This works

txt = '''
name         : xxxx
desgination  : yyyy
cities       : 
    LA       : Los Angeles
    NY       : New York
HeadQuarters : 
    LA       :  LA
    NY       :  NY    
Country      : USA 
'''
di = {}
for line in txt.split('\n'):
   if len(line)> 1: di[line.split(':')[0].strip()]= line.split(':')[1].strip()

print di # {'name': 'xxxx', 'desgination': 'yyyy', 'LA': 'LA', 'Country': 'USA', 'HeadQuarters': '', 'NY': 'NY', 'cities': ''}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM