[英]Regex match in Python
I have a regex like this 我有这样的正则表达式
r"^(.*?),(.*?)(,.*?=.*)"
And a string like this 和这样的字符串
name1,value1,tag11=value11,tag12=value12,tag13=value13
NAME1,值1,TAG11 = value11,tag12 = value12,tag13 = value13
I am trying to check, using a regex, whether the string follows the following format: name,value
, name and value pairs separated by commas. 我正在尝试使用正则表达式检查字符串是否遵循以下格式:以逗号分隔的
name,value
,名称和值对。
I need then to extract the comma-separated data using a regex. 然后,我需要使用正则表达式提取逗号分隔的数据。
I am getting the data extracted as a first group as name1 and a second group as value2 and a third group matches completely from tag11 to value13 (due to greedy match). 我将提取的数据作为第一组作为name1,将第二组作为value2提取,而第三组则完全从tag11匹配到value13(由于贪婪匹配)。
But I want to match each name and value pairs. 但是我想匹配每个名称和值对。 I am new to Python and not sure how can I achieve this.
我是Python新手,不确定如何实现此目标。
Why not just split by the commas: 为什么不只用逗号分开:
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
print(s.split(','))
If you want to use regex it's just as simple using the pattern: 如果要使用正则表达式,使用模式就一样简单:
[^,]+
Example: 例:
https://regex101.com/r/jS6fgW/1 https://regex101.com/r/jS6fgW/1
Turns out Python doesn't support repeated named capture groups unlike .NET, which is a bit of a shame (means my solution is a little longer than I thought it'd need to be). 事实证明,Python与.NET不同,它不支持重复的命名捕获组,这有点可惜(这意味着我的解决方案比我想象的要长一点)。 Does this meet your requirements?
这符合您的要求吗?
import re
def is_valid(s):
pattern = '^name\d+,value\d+(,tag\d+=value\d+)*$'
return re.match(pattern, s)
def get_name_value_pairs(s):
if not is_valid(s):
raise ValueError('Invalid input: {}'.format(s))
pattern = '((?P<name1>\w+),(?P<value1>\w+))|(?P<name2>\w+)=(?P<value2>\w+)'
for match in re.finditer(pattern, s):
name1 = match.group('name1')
name2 = match.group('name2')
value1 = match.group('value1')
value2 = match.group('value2')
if name1 and value1:
yield name1, value1
elif name2 and value2:
yield name2, value2
if __name__ == '__main__':
testString = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
assert not is_valid('')
assert not is_valid('foo')
assert is_valid(testString)
print(list(get_name_value_pairs(testString)))
Output 产量
[('name1', 'value1'), ('tag11', 'value11'), ('tag12', 'value12'), ('tag13', 'value13')]
Edit 1 编辑1
Added input validation logic. 添加了输入验证逻辑。 Assumptions made:
做出的假设:
name<x>,value<x>
name<x>,value<x>
初始名称/值对 tag<x>=value<x>
tag<x>=value<x>
Note that I'm not currently validating that x is the same value within a name/value pair, which I assume is a requirement. 请注意,我目前不验证x是名称/值对中的相同值,我认为这是必要条件。 I'm
not sure how to do this leaving this as an exercise for the reader. 我
不确定如何执行此操作 ,这只是读者的练习。
First, validate the format acc. 首先,验证格式acc。 to your pattern, and then split with
[,=]
regex (that matches ,
and =
) and convert to a dictionary like this: 到您的模式,然后使用
[,=]
正则表达式(与,
和=
匹配)进行拆分,并转换为这样的字典:
import itertools, re
s = 'name1,value1,tag11=value11,tag12=value12,tag13=value13'
if re.match(r'[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$', s):
l = re.split("[=,]", s)
d = dict(itertools.izip_longest(*[iter(l)] * 2, fillvalue=""))
print(d)
else:
print("Not valid!")
See the Python demo 参见Python演示
^[^,=]+,[^,=]+(?:,[^,=]+=[^,=]+)+$
Details : 详细资料 :
^
- start of string (in the re.match
, this can be omitted since the pattern is already anchored) ^
-字符串的开头(在re.match
,由于模式已经锚定,因此可以省略) [^,=]+
- 1+ chars other than =
and ,
[^,=]+
- 1+字符以外=
和,
,
- a comma ,
-逗号 [^,=]+
- 1+ chars other than =
and ,
[^,=]+
- 1+字符以外=
和,
(?:,[^,=]+=[^,=]+)+
- 1 or more sequences of: (?:,[^,=]+=[^,=]+)+
-1个或多个序列:
,
- comma ,
-逗号 [^,=]+
- 1+ chars other than =
and ,
[^,=]+
- 1+字符以外=
和,
=
- an equal sign =
-等号 [^,=]+
- 1+ chars other than =
and ,
[^,=]+
- 1+字符以外=
和,
$
- end of string. $
-字符串结尾。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.