Python正则表达式：如何在选择中匹配字符串的开头？

Question

I want to match some digits preceded by a non-digit or at the start of the string. 我希望匹配一些前面带有非数字或字符串开头的数字。

As the caret has no special meaning inside brackets I can't use that one, so I checked the reference and discovered the alternate form \\A . 因为插入符号在括号内没有特殊含义我不能使用那个，所以我检查了引用并发现了替代形式\\A

However, when I try to use it I get an error: 但是，当我尝试使用它时，我收到一个错误：

>>> s = '123'
>>> re.findall('[\D\A]\d+', s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: internal: unsupported set operator

What am I doing wrong? 我究竟做错了什么？

Answer 1

You can use a negative lookbehind: 你可以使用负面的lookbehind：

(?<!\d)\d+

Your problem is that you are using \\A (a zero width assertion) in a character class, which is for matching a single character. 您的问题是您在字符类中使用\\A （零宽度断言），用于匹配单个字符。 You could write it like (?:\\D|\\A) instead, but a lookbehind is nicer. 你可以把它写成(?:\\D|\\A) ，但是看起来更好。

Answer 2

Repetition in regular expressions is greedy by default, so using re.findall() with the regex \\d+ will get you exactly what you want: 默认情况下，正则表达式中的重复是贪婪的，因此将re.findall()与正则表达式\\d+将获得您想要的内容：

re.findall(r'\d+', s)

As a side note, you should be using raw strings when writing regular expressions to make sure the backslashes are interpreted properly. 作为旁注，在编写正则表达式时应该使用原始字符串以确保正确解释反斜杠。

Python正则表达式：如何在选择中匹配字符串的开头？

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-03-22 16:20:39

解决方案2
0 2012-03-22 16:23:27

Python正则表达式：如何在选择中匹配字符串的开头？

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-03-22 16:20:39

解决方案2 0 2012-03-22 16:23:27

解决方案1
2 已采纳 2012-03-22 16:20:39

解决方案2
0 2012-03-22 16:23:27