[英]Python regular expression not matching end of line
I'm trying to match a C/C++ function definition using a fairly complex regular expression. 我正在尝试使用相当复杂的正则表达式来匹配C / C ++函数定义。 I've found a case where it's not working and I'm trying to understand why. 我发现了一个不起作用的案例,我试图理解为什么。 Here is the input string which does not match: 这是输入字符串不匹配:
void Dump(const char * itemName, ofstream & os)
which clearly is a valid C++ method declaration. 这显然是一个有效的C ++方法声明。 Here is the RE: 这是RE:
^[^=+-|#]*?([\w<>]+\s+(?!if|for|switch|while|catch|return)\w+)\s*\([^;=+-|]*$
This basically tries to distinguish between other C syntax which looks like a method declaration, ie which has words followed by paraentheses. 这基本上试图区分其他看起来像方法声明的C语法,即具有单词后跟paransheses的语法。
Using the very useful Python regular expression debugger (http://www.pythonregex.com/) I've narrowed it down to the trailing "$" - if I remove the trailing $ in the regular expression, it matches the method signature above; 使用非常有用的Python正则表达式调试器(http://www.pythonregex.com/)我将其缩小到尾随“$” - 如果我删除正则表达式中的尾随$,它匹配上面的方法签名; if I leave in the $, it doesn't. 如果我留在$,它不会。 There must be some idiosyncracy of Python RE's that is eluding me here. 必须有一些Python RE的特性,这是我在这里躲避的。 Thanks. 谢谢。
The use of +-|
使用+-|
in your character class [^;=+-|]
is a range specification. 在你的角色类中[^;=+-|]
是一个范围规范。 This will result in the character class containing (actually excluding since you're using ^
) much more than you intend. 这将导致包含(实际排除,因为您正在使用^
)的字符类远远超出您的意图。 To specify a literal -
in a character class, mention it first like [^-;=+|]
. 要指定文字-
在字符类中,首先提及它,如[^-;=+|]
。
The output of PythonRegex is somewhat misleading. PythonRegex的输出有点误导。 The results of r.groups()
and r.findall()
are both the same: u'void Dump'
, which is the content of the first capturing group. r.groups()
和r.findall()
结果都是相同的: u'void Dump'
,这是第一个捕获组的内容。 If it showed the whole match, you'd see that when remove the $
you're only matching 如果它显示了整个匹配,你会看到当删除$
你只是匹配
void Dump(
...not the whole function definition as you intended. ...不是你想要的整个功能定义。 The reason for that (as Greg explained) is a syntax error in your last character class. 原因(正如Greg解释的那样)是你上一个字符类中的语法错误。 You need to escape the hyphen by listing it first ( [^-;=+|]
) or last ( [^;=+|-]
), or by adding a backslash ( [^;=+\\-|]
). 您需要首先列出连字符( [^-;=+|]
)或最后( [^;=+|-]
),或者添加反斜杠( [^;=+\\-|]
)来转义连字符。
The only way I can see to get PythonRegex to show the whole match is by removing all capturing groups (or converting them to non-capturing). 我能看到让PythonRegex显示整个匹配的唯一方法是删除所有捕获组(或将它们转换为非捕获组)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.