[英]Translate key-value string with arrays into json object in python
I have a (flat) text string that I want to translate into a python dictionary / json.我有一个(平面)文本字符串,我想将其翻译成 python 字典/json。
Example string:示例字符串:
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42
Output should be a dict/json looking like Output 应该是一个 dict/json 看起来像
{
"key1": "value",
"key2": "val ue",
"key3": ["entry1", "entry2"],
"key4": ["o ne", "[two]"],
"key5": "value with a , or secial character#l",
"key6":"text with a protected quotation \" inside",
"key7": [1,101,42]
}
I was using lexer as described here https://www.debugcn.com/en/article/15212391.html but I stuck how to use this together with the brackets...我使用的是这里描述的词法分析器https://www.debugcn.com/en/article/15212391.html但我坚持如何将它与括号一起使用...
def parse_kv_pairs(text):
lexer = shlex.shlex(text, posix=True)
lexer.whitespace = " "
lexer.wordchars += "="
lexer.quotes = "\""
lexer.wordchars += ".-_()/:+*^&%$#@!?|{}[]'`´,"
return dict(word.split(value_sep, maxsplit=1) for word in lexer)
Do you know a library that supports this or do you have an algorithm that is able to translate this?你知道一个支持这个的库,或者你有一个能够翻译这个的算法吗?
I'm happy for any hit:)我很高兴任何打击:)
Using regexps I tried to make sense of what you wanted.使用正则表达式我试图理解你想要什么。 I stuck to all lowercase as in the example and added a couple of extra gotcha keys of my own for testing.我坚持使用示例中的所有小写字母,并添加了一些我自己的额外陷阱键进行测试。
I assumed that any commas in numbers could be stripped and coded any whitespace characters to be equivalent to a space, allowing the input to be split with extra newlines at spaces instead of the long input, (or not - it can be removed).我假设数字中的任何逗号都可以被剥离并将任何空白字符编码为等同于空格,从而允许在空格处使用额外的换行符而不是长输入来分割输入(或者不可以 - 它可以被删除)。 The code runs and the assertion at the end shows what it produces.代码运行,最后的断言显示了它产生的结果。
Lists cannot be nested.列表不能嵌套。
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/66491209/translate-key-value-string-with-arrays-into-json-object-in-python
Created on Fri Mar 5 18:52:01 2021
@author: paddy3118
"""
import re
data = r"""
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"]
key5="value with a , or secial character#l"
key6="text with a protected quotation \" inside" key7=1,101,42
key8=key9 key10="not a key0=whatewver"
"""
data = data.strip()
space = '\t \n\r'
i = 0
state = 'KEY'
d = {} # dict for parsed data
while data:
if state == 'KEY':
if not (m := re.search(r'^([a-z0-9]+)=', data)):
break # d, data
key = m.groups()[0]
data = data[m.end():]
state = 'VAL'
if state in {'VAL', 'LISTVAL'}:
if (m:= re.search('^([a-z][a-z0-9]+)[\s,]*', data)):
val = m.groups()[0]
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif (m:= re.search(r'^"(.*?[^\\])"[\s,]*', data)):
val = m.groups()[0]
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif (m:= re.search(r'^([0-9][0-9.,]*[^,])[\s,]*', data)):
val = m.groups()[0]
val = float(val.replace(',', ''))
val = int(val) if val.is_integer() else val
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif state == 'VAL' and data[0] == '[':
listval = []
state = 'LISTVAL'
data = data[1:].lstrip()
elif state == 'LISTVAL' and data[0] == ']':
d[key] = listval
state = 'KEY'
data = data[1:].lstrip()
else:
break
assert d == {'key1': 'value',
'key2': 'val ue',
'key3': ['entry1', 'entry2'],
'key4': ['o ne', '[two]'],
'key5': 'value with a , or secial character#l',
'key6': 'text with a protected quotation \\" inside',
'key7': 110142,
'key8': 'key9',
'key10': 'not a key0=whatewver'}
We could assume that the values (after the equal sign) are JSON compatible, with two exceptions:我们可以假设这些值(等号之后)与 JSON 兼容,但有两个例外:
So, if we can capture the part after the equal sign, we could:因此,如果我们可以捕获等号之后的部分,我们可以:
Here is the suggested code:这是建议的代码:
import re
import json
def parse(s):
d = {}
key = value = ""
for m in re.findall(r'"(?:[^"\\]|\\.)*"|\w+=?|\S', s) + ["="]:
if m[-1] == '=': # Arrived at a new key/value pair
if key: # Process previous key/value pair
try:
d[key] = json.loads(value)
except Exception: # Try with brackets, if that fails: input is bad
d[key] = json.loads("[{}]".format(value))
key = m[:-1] # New key
value = ""
elif m[0].isalpha(): # Wrap in quotes
value += '"{}"'.format(m)
else: # Punctuation, digits, ...
value += m
return d
Here is how you would call that function for the example data you have given:对于您提供的示例数据,您可以将其称为 function:
s = r'key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42'
result = parse(s)
The result will be:结果将是:
{
'key1': 'value',
'key2': 'val ue',
'key3': ['entry1', 'entry2'],
'key4': ['o ne', '[two]'],
'key5': 'value with a , or secial character#l',
'key6': 'text with a protected quotation " inside',
'key7': [1, 101, 42]
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.