[英]Translate key-value string with arrays into json object in python
我有一個(平面)文本字符串,我想將其翻譯成 python 字典/json。
示例字符串:
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42
Output 應該是一個 dict/json 看起來像
{
"key1": "value",
"key2": "val ue",
"key3": ["entry1", "entry2"],
"key4": ["o ne", "[two]"],
"key5": "value with a , or secial character#l",
"key6":"text with a protected quotation \" inside",
"key7": [1,101,42]
}
我使用的是這里描述的詞法分析器https://www.debugcn.com/en/article/15212391.html但我堅持如何將它與括號一起使用...
def parse_kv_pairs(text):
lexer = shlex.shlex(text, posix=True)
lexer.whitespace = " "
lexer.wordchars += "="
lexer.quotes = "\""
lexer.wordchars += ".-_()/:+*^&%$#@!?|{}[]'`´,"
return dict(word.split(value_sep, maxsplit=1) for word in lexer)
你知道一個支持這個的庫,或者你有一個能夠翻譯這個的算法嗎?
我很高興任何打擊:)
使用正則表達式我試圖理解你想要什么。 我堅持使用示例中的所有小寫字母,並添加了一些我自己的額外陷阱鍵進行測試。
我假設數字中的任何逗號都可以被剝離並將任何空白字符編碼為等同於空格,從而允許在空格處使用額外的換行符而不是長輸入來分割輸入(或者不可以 - 它可以被刪除)。 代碼運行,最后的斷言顯示了它產生的結果。
列表不能嵌套。
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/66491209/translate-key-value-string-with-arrays-into-json-object-in-python
Created on Fri Mar 5 18:52:01 2021
@author: paddy3118
"""
import re
data = r"""
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"]
key5="value with a , or secial character#l"
key6="text with a protected quotation \" inside" key7=1,101,42
key8=key9 key10="not a key0=whatewver"
"""
data = data.strip()
space = '\t \n\r'
i = 0
state = 'KEY'
d = {} # dict for parsed data
while data:
if state == 'KEY':
if not (m := re.search(r'^([a-z0-9]+)=', data)):
break # d, data
key = m.groups()[0]
data = data[m.end():]
state = 'VAL'
if state in {'VAL', 'LISTVAL'}:
if (m:= re.search('^([a-z][a-z0-9]+)[\s,]*', data)):
val = m.groups()[0]
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif (m:= re.search(r'^"(.*?[^\\])"[\s,]*', data)):
val = m.groups()[0]
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif (m:= re.search(r'^([0-9][0-9.,]*[^,])[\s,]*', data)):
val = m.groups()[0]
val = float(val.replace(',', ''))
val = int(val) if val.is_integer() else val
if state == 'VAL':
d[key] = val
state = 'KEY'
else:
listval.append(val)
data = data[m.end():]
elif state == 'VAL' and data[0] == '[':
listval = []
state = 'LISTVAL'
data = data[1:].lstrip()
elif state == 'LISTVAL' and data[0] == ']':
d[key] = listval
state = 'KEY'
data = data[1:].lstrip()
else:
break
assert d == {'key1': 'value',
'key2': 'val ue',
'key3': ['entry1', 'entry2'],
'key4': ['o ne', '[two]'],
'key5': 'value with a , or secial character#l',
'key6': 'text with a protected quotation \\" inside',
'key7': 110142,
'key8': 'key9',
'key10': 'not a key0=whatewver'}
我們可以假設這些值(等號之后)與 JSON 兼容,但有兩個例外:
因此,如果我們可以捕獲等號之后的部分,我們可以:
這是建議的代碼:
import re
import json
def parse(s):
d = {}
key = value = ""
for m in re.findall(r'"(?:[^"\\]|\\.)*"|\w+=?|\S', s) + ["="]:
if m[-1] == '=': # Arrived at a new key/value pair
if key: # Process previous key/value pair
try:
d[key] = json.loads(value)
except Exception: # Try with brackets, if that fails: input is bad
d[key] = json.loads("[{}]".format(value))
key = m[:-1] # New key
value = ""
elif m[0].isalpha(): # Wrap in quotes
value += '"{}"'.format(m)
else: # Punctuation, digits, ...
value += m
return d
對於您提供的示例數據,您可以將其稱為 function:
s = r'key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42'
result = parse(s)
結果將是:
{
'key1': 'value',
'key2': 'val ue',
'key3': ['entry1', 'entry2'],
'key4': ['o ne', '[two]'],
'key5': 'value with a , or secial character#l',
'key6': 'text with a protected quotation " inside',
'key7': [1, 101, 42]
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.