[英]Match a word, followed by two optionals group in any order
我正在为一个小型库编写一种解析器。
我的字符串采用以下格式:
text = "Louis,Edward,John|85.56!26,Billy,Don!18|78.0,Dean"
只是要更清楚,这是人的名单 ,以逗号分隔,并跟着两个自选分隔符(|!和),第一后有是与0-2小数一些 重量 ,而之后的“ !” 有一个表示年龄的整数。 分隔符和相关值可以按任何顺序出现,正如您可以看到的John和Don 。
我需要使用Regex(我知道我可以通过许多其他方式)提取所有长度在2到4之间的名称以及两个分隔符和以下值(如果它们存在)。
这是我的预期结果 :
[('John', '|85.56', '!26'), ('Don', '|78.00' ,'!18'), ('Dean', '', '')]
我正在尝试使用此代码:
import re
text = "Louis,Edward,John|85.56!26,Billy,Don!18|78.0,Dean"
pattern = re.compile(r'(\b\w{2,4}\b)(\!\d+)?(\|\d+(?:\.\d{1,2})?)?')
search_result = pattern.findall(text)
print(search_result)
但这是实际结果:
[('John', '', '|85.56'), ('26', '', ''), ('Don', '!18', '|78.0'), ('Dean', '', '')]
以下正则表达式似乎给出了你想要的东西:
re.findall(r'(\b[a-z]{2,4}\b)(?:(!\d+)|(\|\d+(?:\.\d{,2})?))*', text, re.I)
#[('John', '!26', '|85.56'), ('Don', '!18', '|78.0'), ('Dean', '', '')]
如果您不想要这些名称,可以轻松过滤掉它们。
Pyparsing擅长从较简单的表达式组合复杂表达式,并包含许多用于可选,无序和逗号分隔值的内置函数。 请参阅以下代码中的注释:
import pyparsing as pp
real = pp.pyparsing_common.real
integer = pp.pyparsing_common.integer
name = pp.Word(pp.alphas, min=2, max=4)
# a valid person entry starts with a name followed by an optional !integer for age
# and an optional |real for weight; the '&' operator allows these to occur in either
# order, but at most only one of each will be allowed
expr = pp.Group(name("name")
+ (pp.Optional(pp.Suppress('!') + integer("age"), default='')
& pp.Optional(pp.Suppress('|') + real("weight"), default='')))
# other entries that we don't care about
other = pp.Word(pp.alphas, min=5)
# an expression for the complete input line - delimitedList defaults to using
# commas as delimiters; and we don't really care about the other entries, just
# suppress them from the results; whitespace is also skipped implicitly, but that
# is not an issue in your given sample text
input_expr = pp.delimitedList(expr | pp.Suppress(other))
# try it against your test data
text = "Louis,Edward,John|85.56!26,Billy,Don!18|78.0,Dean"
input_expr.runTests(text)
打印:
Louis,Edward,John|85.56!26,Billy,Don!18|78.0,Dean
[['John', 85.56, 26], ['Don', 18, 78.0], ['Dean', '', '']]
[0]:
['John', 85.56, 26]
- age: 26
- name: 'John'
- weight: 85.56
[1]:
['Don', 18, 78.0]
- age: 18
- name: 'Don'
- weight: 78.0
[2]:
['Dean', '', '']
- name: 'Dean'
在这种情况下,使用预定义的实数和整数表达式不仅可以解析值,还可以转换为int和float。 可以像对象属性一样访问命名参数:
for person in input_expr.parseString(text):
print("({!r}, {}, {})".format(person.name, person.age, person.weight))
得到:
('John', 26, 85.56)
('Don', 18, 78.0)
('Dean', , )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.