將python中的行解析為字典的最佳方法

Question

我有一個像這樣的文件

account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"

沒有特殊的定界符，每個鍵都有一個值，如果它是字符串，則用雙引號括起來，如果是數字則不用。 沒有鍵，但沒有值，盡管可能存在空白字符串，這些字符串表示為“”，並且沒有引號的轉義字符，因為不需要

我想知道用python解析這種行並將值作為鍵值對存儲在字典中的好方法是什么

Answer 1

為此，我們需要一個正則表達式。

import re, decimal
r= re.compile('([^ =]+) *= *("[^"]*"|[^ ]*)')

d= {}
for k, v in r.findall(line):
    if v[:1]=='"':
        d[k]= v[1:-1]
    else:
        d[k]= decimal.Decimal(v)

>>> d
{'account': 'TEST1', 'subject': 'some value', 'values': '3=this, 4=that', 'price': Decimal('20.11'), 'Qty': Decimal('100.0')}

如果願意，可以使用浮點數而不是十進制數，但是如果涉及金錢，則可能不是一個好主意。

Answer 2

pyparsing再現可能更簡單一些：

from pyparsing import *

# define basic elements - use re's for numerics, faster than easier than 
# composing from pyparsing objects
integer = Regex(r'[+-]?\d+')
real = Regex(r'[+-]?\d+\.\d*')
ident = Word(alphanums)
value = real | integer | quotedString.setParseAction(removeQuotes)

# define a key-value pair, and a configline as one or more of these
# wrap configline in a Dict so that results are accessible by given keys
kvpair = Group(ident + Suppress('=') + value)
configline = Dict(OneOrMore(kvpair))

src = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" ' \
        'values="3=this, 4=that"'

configitems = configline.parseString(src)

現在，您可以使用返回的配置項ParseResults對象訪問您的作品：

>>> print configitems.asList()
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
 ['subject', 'some value'], ['values', '3=this, 4=that']]

>>> print configitems.asDict()
{'account': 'TEST1', 'Qty': '100', 'values': '3=this, 4=that', 
  'price': '20.11', 'subject': 'some value'}

>>> print configitems.dump()
[['account', 'TEST1'], ['Qty', '100'], ['price', '20.11'], 
 ['subject', 'some value'], ['values', '3=this, 4=that']]
- Qty: 100
- account: TEST1
- price: 20.11
- subject: some value
- values: 3=this, 4=that

>>> print configitems.keys()
['account', 'subject', 'values', 'price', 'Qty']

>>> print configitems.subject
some value

Answer 3

bobince解析值的遞歸變體，其中包含嵌入的等於字典：

>>> import re
>>> import pprint
>>>
>>> def parse_line(line):
...     d = {}
...     a = re.compile(r'\s*(\w+)\s*=\s*("[^"]*"|[^ ,]*),?')
...     float_re = re.compile(r'^\d.+$')
...     int_re = re.compile(r'^\d+$')
...     for k,v in a.findall(line):
...             if int_re.match(k):
...                     k = int(k)
...             if v[-1] == '"':
...                     v = v[1:-1]
...             if '=' in v:
...                     d[k] = parse_line(v)
...             elif int_re.match(v):
...                     d[k] = int(v)
...             elif float_re.match(v):
...                     d[k] = float(v)
...             else:
...                     d[k] = v
...     return d
...
>>> line = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values=
"3=this, 4=that"'
>>> pprint.pprint(parse_line(line))
{'Qty': 100,
 'account': 'TEST1',
 'price': 20.109999999999999,
 'subject': 'some value',
 'values': {3: 'this', 4: 'that'}}

Answer 4

如果您不想使用正則表達式，則另一個選擇是一次讀取字符串中的一個字符：

string = 'account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that"'

inside_quotes = False
key = None
value = ""
dict = {}

for c in string:
    if c == '"':
        inside_quotes = not inside_quotes
    elif c == '=' and not inside_quotes:
        key = value
        value = ''
    elif c == ' ':
        if inside_quotes:
            value += ' ';
        elif key and value:
            dict[key] = value
            key = None
            value = ''
    else:
        value += c

dict[key] = value
print dict

將python中的行解析為字典的最佳方法

問題描述

4 個解決方案

解決方案1
11 已采納 2009-10-29 15:20:08

解決方案2
5 2009-10-30 16:09:33

解決方案3
0

解決方案4
0 2009-10-29 17:05:58

將python中的行解析為字典的最佳方法

問題描述

4 個解決方案

解決方案1 11 已采納 2009-10-29 15:20:08

解決方案2 5 2009-10-30 16:09:33

解決方案3 0

解決方案4 0 2009-10-29 17:05:58

解決方案1
11 已采納 2009-10-29 15:20:08

解決方案2
5 2009-10-30 16:09:33

解決方案3
0

解決方案4
0 2009-10-29 17:05:58