简体   繁体   中英

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string. How can I to split this string to specific format?

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'

import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
    print('m1:',i)

I only get result 1 correctly. Does anyone know how to do?

m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],

Use '=' to split instead, then you can work around with variable name and it's value.

You still need to handle the type casting for values ( regex , split , try with casting may help).

Also, same as others' comment, using dict may be easier to handle

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []

for a in al[1:-1]:
  var_l.append(a.split(',')[-1])
  value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])

output = dict(zip(var_l, value_l))
print(output)

You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:

re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'), 
#  ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]

The answer is like below

import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
temp_d = {}
for i,j in m1:    
    temp = i.strip(',').split(',')       
    if len(temp)>1:
        for k in temp[:-1]:
            temp_d[k]=''
        temp_d[temp[-1]] = j
    else:
        temp_d[temp[0]] = j
pprint(temp_d)

Output is like

{'Record': '',
 'Save': '',
 'a': '3',
 'b': '1.3',
 'c': 'abch',
 'dict_a': '{a:2,b:3}',
 'list_a': '[1]',
 'list_c': '[1,2]'}

Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):

regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]

This gives a list of all the identifiers in the string:

['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']

We can now define a function to sequentially chop up s using the above list to partition the string sequentially:

def chop(mystr, mylist):
    temp = mystr.partition(mylist[0])[2]
    cut = temp.find(mylist[1])           #strip leading bits
    return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
    mystr, mylist = chop(mystr, mylist)
    temp.append(mystr)

This (convoluted) slicing operation gives this list of strings:

['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',         
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}'] 

Now cut off the ends using each successive entry:

result = []
for x in range(len(temp) - 1):
    cut = temp[x].find(temp[x+1]) - 1    #-1 to remove commas
    result.append(temp[x][:cut])
result.append(temp.pop())                #get the last item

Now we have the full list:

['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']

Each element is easily parsable into key:value pairs (and is also executable via exec ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM