簡體   English   中英

將帶有 arrays 的鍵值字符串翻譯成 json object 中的 python

[英]Translate key-value string with arrays into json object in python

我有一個(平面)文本字符串,我想將其翻譯成 python 字典/json。

示例字符串:

key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42

Output 應該是一個 dict/json 看起來像

{
"key1": "value",
"key2": "val ue",
"key3": ["entry1", "entry2"],
"key4": ["o ne", "[two]"],
"key5": "value with a , or secial character#l",
"key6":"text with a protected quotation \" inside",
"key7": [1,101,42]
}

我使用的是這里描述的詞法分析器https://www.debugcn.com/en/article/15212391.html但我堅持如何將它與括號一起使用...

    def parse_kv_pairs(text):
        lexer = shlex.shlex(text, posix=True)
        lexer.whitespace = " "
        lexer.wordchars += "="
        lexer.quotes = "\""
        lexer.wordchars += ".-_()/:+*^&%$#@!?|{}[]'`´,"
        return dict(word.split(value_sep, maxsplit=1) for word in lexer)

你知道一個支持這個的庫,或者你有一個能夠翻譯這個的算法嗎?

我很高興任何打擊:)

使用正則表達式我試圖理解你想要什么。 我堅持使用示例中的所有小寫字母,並添加了一些我自己的額外陷阱鍵進行測試。

我假設數字中的任何逗號都可以被剝離並將任何空白字符編碼為等同於空格,從而允許在空格處使用額外的換行符而不是長輸入來分割輸入(或者不可以 - 它可以被刪除)。 代碼運行,最后的斷言顯示了它產生的結果。

列表不能嵌套。

# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/66491209/translate-key-value-string-with-arrays-into-json-object-in-python

Created on Fri Mar  5 18:52:01 2021

@author: paddy3118
"""
import re

data = r"""
key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"]
key5="value with a , or secial character#l"
key6="text with a protected quotation \" inside" key7=1,101,42
key8=key9 key10="not a key0=whatewver"
"""
data = data.strip()
space = '\t \n\r'
i = 0
state = 'KEY'
d = {}  # dict for parsed data
while data:
    if state == 'KEY':
        if not (m := re.search(r'^([a-z0-9]+)=', data)):
            break  # d, data
        key = m.groups()[0]
        data = data[m.end():]
        state = 'VAL'
    if state in {'VAL', 'LISTVAL'}:
        if (m:= re.search('^([a-z][a-z0-9]+)[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^"(.*?[^\\])"[\s,]*', data)):
            val = m.groups()[0]
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif (m:= re.search(r'^([0-9][0-9.,]*[^,])[\s,]*', data)):
            val = m.groups()[0]
            val = float(val.replace(',', ''))
            val = int(val) if val.is_integer() else val
            if state == 'VAL':
                d[key] = val
                state = 'KEY'
            else:
                listval.append(val)
            data = data[m.end():]
        elif state == 'VAL' and data[0] == '[':
            listval = []
            state = 'LISTVAL'
            data = data[1:].lstrip()
        elif state == 'LISTVAL' and data[0] == ']':
            d[key] = listval
            state = 'KEY'
            data = data[1:].lstrip()
        else:
            break

assert d == {'key1': 'value',
 'key2': 'val ue',
 'key3': ['entry1', 'entry2'],
 'key4': ['o ne', '[two]'],
 'key5': 'value with a , or secial character#l',
 'key6': 'text with a protected quotation \\" inside',
 'key7': 110142,
 'key8': 'key9',
 'key10': 'not a key0=whatewver'}

我們可以假設這些值(等號之后)與 JSON 兼容,但有兩個例外:

  • 單詞可能出現不帶引號
  • 列表可能不帶方括號(它們被逗號分隔符識別)

因此,如果我們可以捕獲等號之后的部分,我們可以:

  1. 識別帶引號的字符串
  2. 用雙引號將每個未引用的單詞(以字母開頭)括起來
  3. 將其解析為 JSON。
  4. 如果上一步失敗,用方括號括起來並再次解析為 JSON

這是建議的代碼:

import re 
import json

def parse(s):
    d = {}
    key = value = ""
    for m in re.findall(r'"(?:[^"\\]|\\.)*"|\w+=?|\S', s) + ["="]:
        if m[-1] == '=':  # Arrived at a new key/value pair
            if key:  # Process previous key/value pair
                try:
                    d[key] = json.loads(value)
                except Exception: # Try with brackets, if that fails: input is bad
                    d[key] = json.loads("[{}]".format(value))
            key = m[:-1]  # New key
            value = ""
        elif m[0].isalpha():  # Wrap in quotes
            value += '"{}"'.format(m)
        else:  # Punctuation, digits, ...
            value += m
    return d

對於您提供的示例數據,您可以將其稱為 function:

s = r'key1=value key2="val ue" key3=[entry1, entry2] key4=["o ne", "[two]"] key5="value with a , or secial character#l" key6="text with a protected quotation \" inside" key7=1,101,42'

result = parse(s)

結果將是:

{
   'key1': 'value', 
   'key2': 'val ue', 
   'key3': ['entry1', 'entry2'], 
   'key4': ['o ne', '[two]'], 
   'key5': 'value with a , or secial character#l', 
   'key6': 'text with a protected quotation " inside', 
   'key7': [1, 101, 42]
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM