简体   繁体   English

在Python中使用findall和括号

[英]Use of findall and parenthesis in Python

I need to extract all letters after the + sign or at the beginning of a string like this: 我需要在+符号之后或字符串的开头提取所有字母如下所示:

formula = "X+BC+DAF"

I tried so, and I do not want to see the + sign in the result. 我试过了,我不想在结果中看到+号。 I wish see only ['X', 'B', 'D'] . 我希望只看到['X', 'B', 'D']

>>> re.findall("^[A-Z]|[+][A-Z]", formula)
['X', '+B', '+D']

When I grouped with parenthesis, I got this strange result: 当我用括号分组时,我得到了这个奇怪的结果:

re.findall("^([A-Z])|[+]([A-Z])", formula)
[('X', ''), ('', 'B'), ('', 'D')]

Why it created tuples when I try to group ? 当我尝试分组时为什么会创建元组? How to write the regexp directly such that it returns ['X', 'B', 'D'] ? 如何直接编写正则表达式,使其返回['X', 'B', 'D']

If there are any capturing groups in the regular expression then re.findall returns only the values captured by the groups. 如果正则表达式中有任何捕获组,则re.findall仅返回组捕获的值。 If there are no groups the entire matched string is returned. 如果没有组,则返回整个匹配的字符串。

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. 返回字符串中pattern的所有非重叠匹配,作为字符串列表。 The string is scanned left-to-right, and matches are returned in the order found. 从左到右扫描字符串,并按找到的顺序返回匹配项。 If one or more groups are present in the pattern, return a list of groups; 如果模式中存在一个或多个组,则返回组列表; this will be a list of tuples if the pattern has more than one group. 如果模式有多个组,这将是一个元组列表。 Empty matches are included in the result unless they touch the beginning of another match. 结果中包含空匹配,除非它们触及另一个匹配的开头。


How to write the regexp directly such that it returns ['X', 'B', 'D'] ? 如何直接编写正则表达式,使其返回['X','B','D']?

Instead of using a capturing group you can use a non-capturing group: 您可以使用非捕获组来代替使用捕获组:

>>> re.findall(r"(?:^|\+)([A-Z])", formula)
['X', 'B', 'D']

Or for this specific case you could try a simpler solution using a word boundary: 或者对于这种特定情况,您可以尝试使用单词边界的更简单的解决方案:

>>> re.findall(r"\b[A-Z]", formula)
['X', 'B', 'D']

Or a solution using str.split that doesn't use regular expressions: 或者使用不使用正则表达式的str.split的解决方案:

>>> [s[0] for s in formula.split('+')]
['X', 'B', 'D']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM