繁体   English   中英

将带有 arrays 的键值字符串翻译成 json object 中的 python

[英]Translate key-value string with arrays into json object in python

我有一个(平面)文本字符串,我想将其翻译成 python 字典/json。

示例字符串:

key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42

Output 应该是一个 dict/json 看起来像

{
"key1": "value",
"key2": "val ue",
"key3": ["entry1", "entry2"],
"key4": ["o ne", "[two]"],
"key5": "value with a , or secial character#l",
"key6":"text with a protected quotation \" inside",
"key7": [1,101,42]
}

我使用的是这里描述的词法分析器https://www.debugcn.com/en/article/15212391.html但我坚持如何将它与括号一起使用...

    def parse_kv_pairs(text):
        lexer = shlex.shlex(text, posix=True)
        lexer.whitespace = " "
        lexer.wordchars += "="
        lexer.quotes = "\""
        lexer.wordchars += ".-_()/:+*^&%$#@!?|{}[]'`´,"
        return dict(word.split(value_sep, maxsplit=1) for word in lexer)

你知道一个支持这个的库,或者你有一个能够翻译这个的算法吗?

我很高兴任何打击:)

使用正则表达式我试图理解你想要什么。 我坚持使用示例中的所有小写字母,并添加了一些我自己的额外陷阱键进行测试。

我假设数字中的任何逗号都可以被剥离并将任何空白字符编码为等同于空格,从而允许在空格处使用额外的换行符而不是长输入来分割输入(或者不可以 - 它可以被删除)。 代码运行,最后的断言显示了它产生的结果。

列表不能嵌套。

# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/66491209/translate-key-value-string-with-arrays-into-json-object-in-python

Created on Fri Mar  5 18:52:01 2021

@author: paddy3118
"""
import re

data = r"""
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"]
key5="value with a , or secial character#l"
key6="text with a protected quotation \" inside" key7=1,101,42
key8=key9 key10="not a key0=whatewver"
"""
data = data.strip()
space = '\t \n\r'
i = 0
state = 'KEY'
d = {}  # dict for parsed data
while data:
    if state == 'KEY':
        if not (m := re.search(r'^([a-z0-9]+)=', data)):
            break  # d, data
        key = m.groups()[0]
        data = data[m.end():]
        state = 'VAL'
    if state in {'VAL', 'LISTVAL'}:
        if (m:= re.search('^([a-z][a-z0-9]+)[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^"(.*?[^\\])"[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^([0-9][0-9.,]*[^,])[\s,]*', data)):
            val = m.groups()[0]
            val = float(val.replace(',', ''))
            val = int(val) if val.is_integer() else val
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif state == 'VAL' and data[0] == '[':
            listval = []
            state = 'LISTVAL'
            data = data[1:].lstrip()
        elif state == 'LISTVAL' and data[0] == ']':
            d[key] = listval
            state = 'KEY'
            data = data[1:].lstrip()
        else:
            break

assert d == {'key1': 'value',
 'key2': 'val ue',
 'key3': ['entry1', 'entry2'],
 'key4': ['o ne', '[two]'],
 'key5': 'value with a , or secial character#l',
 'key6': 'text with a protected quotation \\" inside',
 'key7': 110142,
 'key8': 'key9',
 'key10': 'not a key0=whatewver'}

我们可以假设这些值(等号之后)与 JSON 兼容,但有两个例外:

  • 单词可能出现不带引号
  • 列表可能不带方括号(它们被逗号分隔符识别)

因此,如果我们可以捕获等号之后的部分,我们可以:

  1. 识别带引号的字符串
  2. 用双引号将每个未引用的单词(以字母开头)括起来
  3. 将其解析为 JSON。
  4. 如果上一步失败,用方括号括起来并再次解析为 JSON

这是建议的代码:

import re 
import json

def parse(s):
    d = {}
    key = value = ""
    for m in re.findall(r'"(?:[^"\\]|\\.)*"|\w+=?|\S', s) + ["="]:
        if m[-1] == '=':  # Arrived at a new key/value pair
            if key:  # Process previous key/value pair
                try:
                    d[key] = json.loads(value)
                except Exception: # Try with brackets, if that fails: input is bad
                    d[key] = json.loads("[{}]".format(value))
            key = m[:-1]  # New key
            value = ""
        elif m[0].isalpha():  # Wrap in quotes
            value += '"{}"'.format(m)
        else:  # Punctuation, digits, ...
            value += m
    return d

对于您提供的示例数据,您可以将其称为 function:

s = r'key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42'

result = parse(s)

结果将是:

{
   'key1': 'value', 
   'key2': 'val ue', 
   'key3': ['entry1', 'entry2'], 
   'key4': ['o ne', '[two]'], 
   'key5': 'value with a , or secial character#l', 
   'key6': 'text with a protected quotation " inside', 
   'key7': [1, 101, 42]
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM