Python中的正则表达式匹配

Question

I have a regex like this 我有这样的正则表达式

r"^(.*?),(.*?)(,.*?=.*)"

And a string like this 和这样的字符串

name1,value1,tag11=value11,tag12=value12,tag13=value13 NAME1，值1，TAG11 = value11，tag12 = value12，tag13 = value13

I am trying to check, using a regex, whether the string follows the following format: name,value , name and value pairs separated by commas. 我正在尝试使用正则表达式检查字符串是否遵循以下格式：以逗号分隔的name,value ，名称和值对。

I need then to extract the comma-separated data using a regex. 然后，我需要使用正则表达式提取逗号分隔的数据。

I am getting the data extracted as a first group as name1 and a second group as value2 and a third group matches completely from tag11 to value13 (due to greedy match). 我将提取的数据作为第一组作为name1，将第二组作为value2提取，而第三组则完全从tag11匹配到value13（由于贪婪匹配）。

But I want to match each name and value pairs. 但是我想匹配每个名称和值对。 I am new to Python and not sure how can I achieve this. 我是Python新手，不确定如何实现此目标。

Answer 1

Why not just split by the commas: 为什么不只用逗号分开：

s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
print(s.split(','))

If you want to use regex it's just as simple using the pattern: 如果要使用正则表达式，使用模式就一样简单：

[^,]+

Example: 例：

https://regex101.com/r/jS6fgW/1 https://regex101.com/r/jS6fgW/1

Answer 2

Turns out Python doesn't support repeated named capture groups unlike .NET, which is a bit of a shame (means my solution is a little longer than I thought it'd need to be). 事实证明，Python与.NET不同，它不支持重复的命名捕获组，这有点可惜（这意味着我的解决方案比我想象的要长一点）。 Does this meet your requirements? 这符合您的要求吗？

import re

def is_valid(s):
    pattern = '^name\d+,value\d+(,tag\d+=value\d+)*$'
    return re.match(pattern, s)

def get_name_value_pairs(s):
    if not is_valid(s):
        raise ValueError('Invalid input: {}'.format(s))

    pattern = '((?P<name1>\w+),(?P<value1>\w+))|(?P<name2>\w+)=(?P<value2>\w+)'
    for match in re.finditer(pattern, s):
        name1 = match.group('name1')
        name2 = match.group('name2')
        value1 = match.group('value1')
        value2 = match.group('value2')

        if name1 and value1:
            yield name1, value1
        elif name2 and value2:
            yield name2, value2

if __name__ == '__main__':
    testString = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
    assert not is_valid('')
    assert not is_valid('foo')
    assert is_valid(testString)

    print(list(get_name_value_pairs(testString)))

Output 产量

[('name1', 'value1'), ('tag11', 'value11'), ('tag12', 'value12'), ('tag13', 'value13')]

Edit 1 编辑1

Added input validation logic. 添加了输入验证逻辑。 Assumptions made: 做出的假设：

Must have initial name/value pair in form name<x>,value<x> 必须具有格式name<x>,value<x>初始名称/值对
All following pairs must be in form tag<x>=value<x> 以下所有对必须采用tag<x>=value<x>
Names and values consist only of alphanumeric characters 名称和值仅包含字母数字字符
Whitespace is not allowed 不允许空格

Note that I'm not currently validating that x is the same value within a name/value pair, which I assume is a requirement. 请注意，我目前不验证x是名称/值对中的相同值，我认为这是必要条件。 I'm ~~not sure how to do this~~ leaving this as an exercise for the reader. 我~~不确定如何执行此操作~~ ，这只是读者的练习。

Answer 3

First, validate the format acc. 首先，验证格式acc。 to your pattern, and then split with [,=] regex (that matches , and = ) and convert to a dictionary like this: 到您的模式，然后使用[,=]正则表达式（与,和=匹配）进行拆分，并转换为这样的字典：

import itertools, re
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
if re.match(r'[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$', s):
    l = re.split("[=,]", s)
    d = dict(itertools.izip_longest(*[iter(l)] * 2, fillvalue=""))
    print(d)
else:
    print("Not valid!")

See the Python demo 参见Python演示

The pattern is 模式是

^[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$

Details : 详细资料 ：

^ - start of string (in the re.match , this can be omitted since the pattern is already anchored) ^ -字符串的开头（在re.match ，由于模式已经锚定，因此可以省略）
[^,=]+ - 1+ chars other than = and , [^,=]+ - 1+字符以外=和,
, - a comma , -逗号
[^,=]+ - 1+ chars other than = and , [^,=]+ - 1+字符以外=和,
(?:,[^,=]+=[^,=]+)+ - 1 or more sequences of: (?:,[^,=]+=[^,=]+)+ -1个或多个序列：
- , - comma , -逗号
- [^,=]+ - 1+ chars other than = and , [^,=]+ - 1+字符以外=和,
- = - an equal sign = -等号
- [^,=]+ - 1+ chars other than = and , [^,=]+ - 1+字符以外=和,
$ - end of string. $ -字符串结尾。

Python中的正则表达式匹配

问题描述

3 个解决方案

解决方案1
1 2017-01-05 10:36:23

解决方案2
1 已采纳 2017-01-05 10:37:31

解决方案3
1 2017-01-05 10:54:40

Python中的正则表达式匹配

问题描述

3 个解决方案

解决方案1 1 2017-01-05 10:36:23

解决方案2 1 已采纳 2017-01-05 10:37:31

解决方案3 1 2017-01-05 10:54:40

解决方案1
1 2017-01-05 10:36:23

解决方案2
1 已采纳 2017-01-05 10:37:31

解决方案3
1 2017-01-05 10:54:40