Python解析字符串列表

Question

I have list of strings, I'm looking for lines like this: 我有字符串列表，我正在寻找像这样的行：

Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Field 3: -10 密钥：af12d9索引：0字段1：1234字段2：1234字段3：-10

after finding lines like this, I want to store each one as a dictionary {'key' : af12d9, 'index' : 0, 'field 1' : .... }, then store this dictionary to a list, so I will have a list of dictionaries. 找到这样的行后，我想将每个字典存储为字典{'key'：af12d9，'index'：0，'field 1'：....}，然后将该字典存储到列表中，所以我将有字典清单。

I was able to get it working like this: 我能够使它像这样工作：

listconfig = []
for line in list_of_strings:
    matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line)
    if matched:
        listconfig += [dict(map(lambda pair: (pair[0].strip().lower(), pair[1].strip().lower()),
                                map(lambda line: line[0].split(':'),
                                    [filter(lambda x: x, group) for group in matched])))]

I'm just wondering if there could a better way (short and efficient) to do this because I think the findall will do 5 searches per string. 我只是想知道是否有更好的方法（简短高效）来执行此操作，因为我认为findall将对每个字符串执行5次搜索。 (correct? since it returns a list of 5 tuples.) （正确吗？因为它返回5个元组的列表。）

Thank you. 谢谢。

Solution: 解：

OK, with help of brandizzi, I have found THE answer to this question. 好的，在brandizzi的帮助下，我找到了该问题的答案。

Solution: 解：

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

Answer 1

Firstly, your regex seems to not work properly. 首先，您的正则表达式似乎无法正常工作。 The Key field should have values which could include f , right? Key字段应具有可以包含f值，对吗？ So its group should not be ([0-9A-Ea-e]+) but instead ([0-9A-Fa-f]+) . 因此，其组不应为([0-9A-Ea-e]+) ，而应为([0-9A-Fa-f]+) 。 Also, it is a good - actually, a wonderful - practice to prefix the regex string with r when dealing with regexes because it avoids problems with \\ escaping characters. 同样，在处理正则表达式时，在正则表达式字符串前加上r前缀是一个很好的做法，实际上是一个奇妙的做法，因为它避免了\\转义字符的问题。 (If you do not understand why to do it, look at raw strings ) （如果您不明白为什么要这样做，请查看原始字符串）

Now, my approach to the problem. 现在，我解决问题的方法。 First, I would create a regex without pipes: 首先，我将创建一个没有管道的正则表达式：

>>> regex = r"(Key):[\s]*([0-9A-Fa-f]+)[\s]*" \
...     r"(Index):[\s]*([0-9]+)[\s]*" \
...     r"(Field 1):[\s]*([0-9]+)[\s]*" \
...     r"(Field 2):[\s]*([0-9 A-Za-z]+)[\s]*" \
...     r"(Field 3):[\s]*([-+]?[0-9]+)[\s]*"

With this change, the findall() will return only one tuple of found groups for an entire line. 进行此更改后， findall()将只返回整行中找到的组的一个元组。 In this tuple, each key is followed by its value: 在此元组中，每个键后面都有其值：

>>> re.findall(regex, line)
[('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')]

So I get the tuple... 所以我得到了元组...

>>> found = re.findall(regex, line)[0]
>>> found
('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')

...and using slices I get only the keys... ...并且使用切片我只能得到钥匙...

>>> found[::2]
('Key', 'Index', 'Field 1', 'Field 2', 'Field 3')

...and also only the values: ...以及仅值：

>>> found[1::2]
('af12d9', '0', '1234', '1234 Ring ', '-10')

Then I create a list of tuples containing the key and its corresponding value with zip() function : 然后，我使用zip()函数创建一个包含键及其对应值的元组列表：

>>> zip(found[::2], found[1::2])
[('Key', 'af12d9'), ('Index', '0'), ('Field 1', '1234'), ('Field 2', '1234 Ring '), ('Field 3', '-10')]

The gran finale is to pass the list of tuples to the dict() constructor: 大结局是将元组列表传递给dict()构造函数：

>>> dict(zip(found[::2], found[1::2]))
{'Field 3': '-10', 'Index': '0', 'Field 1': '1234', 'Key': 'af12d9', 'Field 2': '1234 Ring '}

I find this solution the best, but it is indeed a subjective question in some sense. 我认为这种解决方案是最好的，但是从某种意义上说，这确实是一个主观问题。 HTH anyway :) 反正HTH :)

Answer 2

OK, with help of brandizzi, I have found THE answer to this question. 好的，在brandizzi的帮助下，我找到了该问题的答案。

Solution: 解：

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

Answer 3

import re

str_list = "Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Ring Field 3: -10"
results = {}
for match in re.findall("(.*?):\ (.*?)\ ", str_list+' '):
    results[match[0]] = match[1]

Answer 4

The pattern in your example is probably not matching your example data due to the "Ring". 由于“ Ring”，示例中的模式可能与示例数据不匹配。 Here is some code which might help: 以下代码可能会有所帮助：

import re
# the keys to look for
keys = ['Key','Index','Field 1','Field 2','Field 3']
# a pattern for those keys in exact order
pattern = ''.join(["(%s):(.*)" % key for key in keys])
# sample data
data = "Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Ring Field 3: -10"
# look for the pattern
hit = re.match(pattern,data)
if hit:
    # get the matched elements
    groups = hit.groups()
    # group them in pairs and create a dict
    d = dict(zip(groups[::2], groups[1::2]))
    # print result
    print d

Answer 5

You could use a parser library. 您可以使用解析器库。 I know Lepl, so will use that, but because it is implemented in Python it will not be so efficient. 我知道Lepl，所以会用到它，但是因为它是用Python实现的，所以效率不高。 However, the solution is fairly short and, I hope, very easy to understand: 但是，解决方案很短，我希望它很容易理解：

def parser():
  key = (Drop("Key:") & Regexp("[0-9a-fA-F]+")) > 'key'
  index = (Drop("Index:") & Integer()) > 'index'
  def Field(n):
      return (Drop("Field" + str(n)) & Integer()) > 'field'+str(n)
  with DroppedSpaces():
      line = (key & index & Field(1) & Field(2) & Field(3)) >> make_dict
      return line[:]
p = parser()
print(p.parse_file(...))

It should also be relatively simple to handle a variable number of fields. 处理可变数量的字段也应该相对简单。

Note that the above is not tested (I need to get to work), but should be about right. 请注意，以上内容尚未经过测试（我需要开始工作），但应该是正确的。 In particular, it should return a list of dictionaries, as required. 特别是，它应根据需要返回字典列表。

Answer 6

Your solution would perform better if you did this[*]: 如果这样做，您的解决方案将表现更好[*]：

import re

from itertools import imap

regex = re.compile(flags=re.VERBOSE, pattern=r"""
    Key:\s*(?P<key>[0-9A-Fa-f]+)\s*
    Index:\s*(?P<index>[0-9]+)\s*
    Field\s+1:\s*(?P<field_1>[0-9]+)\s*
    Field\s+2:\s*(?P<field_2>[0-9A-Za-z]+)\s*
    Field\s+3:\s*(?P<field_3>[-+]?[0-9]+)\s*
""")

list_of_strings = [
    'Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Field 3: -10',
    'hey joe!',
    ''
]

listconfig = [
    match.groupdict() for match in imap(regex.search, list_of_strings) if match
]

Also, it'd be more succinct. 而且，它会更加简洁。 Also, I fixed your broken regex pattern. 此外，我修复了损坏的正则表达式模式。

BTW, the result of the above would be: 顺便说一句，上述结果将是：

[{'index': '0', 'field_2': '1234', 'field_3': '-10', 'key': 'af12d9', 'field_1': '1234'}]

[*] Actually - no, it wouldn't. [*]实际上-不，不是。 I timeit'ed both and neither is faster than the other. 我对两者都计时，但没有一个比另一个快。 Still, I like mine better. 不过，我更喜欢我的。

Python解析字符串列表

问题描述

6 个解决方案

解决方案1
5

解决方案2
1 已采纳 2011-04-14 20:42:09

解决方案3
0 2011-04-13 17:35:01

解决方案4
0 2011-04-13 17:41:28

解决方案5
0 2011-04-14 12:11:21

解决方案6
0 2011-05-03 19:45:58

Python解析字符串列表

问题描述

6 个解决方案

解决方案1 5

解决方案2 1 已采纳 2011-04-14 20:42:09

解决方案3 0 2011-04-13 17:35:01

解决方案4 0 2011-04-13 17:41:28

解决方案5 0 2011-04-14 12:11:21

解决方案6 0 2011-05-03 19:45:58

解决方案1
5

解决方案2
1 已采纳 2011-04-14 20:42:09

解决方案3
0 2011-04-13 17:35:01

解决方案4
0 2011-04-13 17:41:28

解决方案5
0 2011-04-14 12:11:21

解决方案6
0 2011-05-03 19:45:58