简体   繁体   English

模式匹配从字符串中获取列表和字典

[英]pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.我在下面有字符串,我想从这个字符串中获取列表、字典、变量。 How can I to split this string to specific format?如何将此字符串拆分为特定格式?

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'

import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
    print('m1:',i)

I only get result 1 correctly.我只能正确得到结果 1。 Does anyone know how to do?有谁知道该怎么做?

m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],

Use '=' to split instead, then you can work around with variable name and it's value.改用“=”进行拆分,然后您可以使用变量名及其值。

You still need to handle the type casting for values ( regex , split , try with casting may help).您仍然需要处理值的类型转换( regexsplit尝试使用转换可能会有所帮助)。

Also, same as others' comment, using dict may be easier to handle另外,和其他人的评论一样,使用 dict 可能更容易处理

s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []

for a in al[1:-1]:
  var_l.append(a.split(',')[-1])
  value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])

output = dict(zip(var_l, value_l))
print(output)

You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:如果您或多或少地明确描述右侧表达式:数字、列表、字典和标识符,您可能会有更好的运气:

re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'), 
#  ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]

The answer is like below答案如下

import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
                  +r"([+-]?\d+(?:\.\d+)?|" # Numbers
                  +r"[+-]?\d+\.|" # More numbers
                  +r"\[[^]]+\]|" # Lists
                  +r"{[^}]+}|" # Dictionaries
                  +r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
           s)
temp_d = {}
for i,j in m1:    
    temp = i.strip(',').split(',')       
    if len(temp)>1:
        for k in temp[:-1]:
            temp_d[k]=''
        temp_d[temp[-1]] = j
    else:
        temp_d[temp[0]] = j
pprint(temp_d)

Output is like Output 就像

{'Record': '',
 'Save': '',
 'a': '3',
 'b': '1.3',
 'c': 'abch',
 'dict_a': '{a:2,b:3}',
 'list_a': '[1]',
 'list_c': '[1,2]'}

Instead of picking out the types, you can start by capturing the identifiers.您可以从捕获标识符开始,而不是挑选类型。 Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):这是一个捕获字符串中所有标识符的正则表达式(仅适用于小写,但请参阅注释):

regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]

This gives a list of all the identifiers in the string:这给出了字符串中所有标识符的列表:

['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']

We can now define a function to sequentially chop up s using the above list to partition the string sequentially:我们现在可以定义一个 function 来顺序切分s使用上面的列表来按顺序划分字符串:

def chop(mystr, mylist):
    temp = mystr.partition(mylist[0])[2]
    cut = temp.find(mylist[1])           #strip leading bits
    return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
    mystr, mylist = chop(mystr, mylist)
    temp.append(mystr)

This (convoluted) slicing operation gives this list of strings:这个(复杂的)切片操作给出了这个字符串列表:

['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',         
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}'] 

Now cut off the ends using each successive entry:现在使用每个连续的条目切断末端:

result = []
for x in range(len(temp) - 1):
    cut = temp[x].find(temp[x+1]) - 1    #-1 to remove commas
    result.append(temp[x][:cut])
result.append(temp.pop())                #get the last item

Now we have the full list:现在我们有了完整的列表:

['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']

Each element is easily parsable into key:value pairs (and is also executable via exec ).每个元素都可以轻松解析为键:值对(也可以通过exec )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM