简体   繁体   English

如何在python正则表达式中重复模式?

[英]How to repeat a pattern in python regular expression?

I'm doing a python regex and have a working expression: 我正在做一个python正则表达式,并有一个有效的表达式:

\n(?P<curve>\w+)(?:.+)(?P<unit>\.\S*)(?:\s+.\s+)(?P<desc>:.+)|\n(?P<curve2>\w+)(?:.+)(?P<unit2>\.\S*)|\n(?P<curve3>\w+)

I would like to know I could repeat the pattern from the first if, the reason is that I would like to not group in many "curve" or "unit" for each case. 我想知道我可以从头开始重复这种模式,原因是我不想针对每种情况将许多“曲线”或“单元”分组。

My test data is as follows: 我的测试数据如下:

#-------------
MD              
BMK_STA            .Mpsi                                   : Modulus
FANG        .                                   : Friction Angle
PR             .unitless                               :  
RHO           .g/cm3                                  

The idea is to have MD and RHO also in "curve" group. 想法是将MD和RHO也放在“曲线”组中。

I am not entirely sure what you mean, but the following may help: 我不确定您的意思,但以下内容可能会有所帮助:

If you want to find every match for a pattern, you can use re.findall(pattern, string) 如果要查找模式的每个匹配项,则可以使用re.findall(pattern, string)

It returns a list of the matches.. 它返回匹配项list

re module docs 重新模组文件

There is no special syntax to avoid that kind of repetition in regexes, so in the general case you can't avoid a certain amount of repetition. 在正则表达式中没有特殊的语法可以避免这种重复,因此在一般情况下,您无法避免一定程度的重复。 However in your specific case you should be able to solve your problem using optional groups: 但是,在您的特定情况下,您应该可以使用可选组解决问题:

\n(?P<curve>\w+)((?:.+)(?P<unit>\.\S*)((?:\s+.\s+)(?P<desc>:.+))?)?

Which is probably better written in verbose mode as: 最好用详细模式写成:

\n(?P<curve>\w+)
(
    .+
    (?P<unit>\.\S*)
    (
        \s+.\s+
        (?P<desc>:.+)
    )?
)?

to make the group nesting easier to read. 使组嵌套更易于阅读。 I've also remove the ?: groups since in this case they are useless. 我也删除了?:组,因为在这种情况下它们是无用的。

Assuming your regex is correct. 假设您的正则表达式正确。 Use the finditer() method for this purpose to iterate all the matches. 为此,请使用finditer()方法来迭代所有匹配项。

Example: 例:

for m in re.finditer(r'REGEX_GOES_HERE', text):
    print m.group('curve')
    print m.group("unit")

In this way you picked all the matches, as well as their named groups are intact as you wanted! 通过这种方式,您可以选择所有比赛,以及它们的named groups完整无缺!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM