简体   繁体   English

在Python中解析键值对

[英]Parse Key Value Pairs in Python

So I have a key value file that's similar to JSON's format but it's different enough to not be picked up by the Python JSON parser. 所以我有一个类似于JSON格式的键值文件,但它与Python JSON解析器不相同。

Example: 例:

"Matt"
{
    "Location"    "New York"
    "Age"         "22"
    "Items"
    {
        "Banana"    "2"
        "Apple"     "5"
        "Cat"       "1"
    }
}

Is there any easy way to parse this text file and store the values into an array such that I could access the data using a format similar to Matt[Items][Banana]? 有没有简单的方法来解析这个文本文件并将值存储到一个数组中,以便我可以使用类似于Matt [Items] [Banana]的格式访问数据? There is only to be one pair per line and a bracket should denote going down a level and going up a level. 每条线只有一对,一个支架应该表示下降一个水平并上升一个水平。

You could use re.sub to 'fix up' your string and then parse it. 您可以使用re.sub来“修复”您的字符串,然后解析它。 As long as the format is always either a single quoted string or a pair of quoted strings on each line, you can use that to determine where to place commas and colons. 只要格式始终是单个带引号的字符串或每行上的一对带引号的字符串,您就可以使用它来确定逗号和冒号的放置位置。

import re
s = """"Matt"
{
    "Location"    "New York"
    "Age"         "22"
    "Items"
    {
        "Banana"    "2"
        "Apple"     "5"
        "Cat"       "1"
    }
}"""

# Put a colon after the first string in every line
s1 = re.sub(r'^\s*(".+?")', r'\1:', s, flags=re.MULTILINE)
# add a comma if the last non-whitespace character in a line is " or }
s2 = re.sub(r'(["}])\s*$', r'\1,', s1, flags=re.MULTILINE)

Once you've done that, you can use ast.literal_eval to turn it into a Python dict. 完成后,您可以使用ast.literal_eval将其转换为Python字典。 I use that over JSON parsing because it allows for trailing commas, without which the decision of where to put commas becomes a lot more complicated: 我通过JSON解析使用它,因为它允许使用尾随逗号,没有这些逗号,将逗号放在哪里的决定变得更加复杂:

import ast
data = ast.literal_eval('{' + s2 + '}')
print data['Matt']['Items']['Banana']
# 2

Not sure how robust this approach is outside of the example you've posted but it does support for escaped characters and deeper levels of structured data. 不确定这种方法在您发布的示例之外是多么强大,但它确实支持转义字符和更深层次的结构化数据。 It's probably not going to be fast enough for large amounts of data. 对于大量数据来说,它可能不够快。

The approach converts your custom data format to JSON using a (very) simple parser to add the required colons and braces, the JSON data can then be converted to a native Python dictionary. 该方法使用(非常)简单的解析器将您的自定义数据格式转换为JSON,以添加所需的冒号和大括号,然后可以将JSON数据转换为本机Python字典。

import json

# Define the data that needs to be parsed
data = '''
"Matt"
{
    "Location"    "New \\"York"
    "Age"         "22"
    "Items"
    {
        "Banana"    "2"
        "Apple"     "5"
        "Cat"
        {
            "foo"   "bar"
        }
    }
}
'''

# Convert the data from custom format to JSON
json_data = ''

# Define parser states
state = 'OUT'
key_or_value = 'KEY'

for c in data:
    # Handle quote characters
    if c == '"':
        json_data += c

        if state == 'IN':
            state = 'OUT'
            if key_or_value == 'KEY':
                key_or_value = 'VALUE'
                json_data += ':'

            elif key_or_value == 'VALUE':
                key_or_value = 'KEY'
                json_data += ','

        else:
            state = 'IN'

    # Handle braces
    elif c == '{':
        if state == 'OUT':
            key_or_value = 'KEY'
        json_data += c

    elif c == '}':
        # Strip trailing comma and add closing brace and comma
        json_data = json_data.rstrip().rstrip(',') + '},'

    # Handle escaped characters
    elif c == '\\':
        state = 'ESCAPED'
        json_data += c

    else:
        json_data += c

# Strip trailing comma
json_data = json_data.rstrip().rstrip(',')

# Wrap the data in braces to form a dictionary
json_data = '{' + json_data + '}'

# Convert from JSON to the native Python
converted_data = json.loads(json_data)

print(converted_data['Matt']['Items']['Banana'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM