简体   繁体   English

Python正则表达式不匹配行尾

[英]Python regular expression not matching end of line

I'm trying to match a C/C++ function definition using a fairly complex regular expression. 我正在尝试使用相当复杂的正则表达式来匹配C / C ++函数定义。 I've found a case where it's not working and I'm trying to understand why. 我发现了一个不起作用的案例,我试图理解为什么。 Here is the input string which does not match: 这是输入字符串不匹配:

   void Dump(const char * itemName, ofstream & os)

which clearly is a valid C++ method declaration. 这显然是一个有效的C ++方法声明。 Here is the RE: 这是RE:

   ^[^=+-|#]*?([\w<>]+\s+(?!if|for|switch|while|catch|return)\w+)\s*\([^;=+-|]*$

This basically tries to distinguish between other C syntax which looks like a method declaration, ie which has words followed by paraentheses. 这基本上试图区分其他看起来像方法声明的C语法,即具有单词后跟paransheses的语法。

Using the very useful Python regular expression debugger (http://www.pythonregex.com/) I've narrowed it down to the trailing "$" - if I remove the trailing $ in the regular expression, it matches the method signature above; 使用非常有用的Python正则表达式调试器(http://www.pythonregex.com/)我将其缩小到尾随“$” - 如果我删除正则表达式中的尾随$,它匹配上面的方法签名; if I leave in the $, it doesn't. 如果我留在$,它不会。 There must be some idiosyncracy of Python RE's that is eluding me here. 必须有一些Python RE的特性,这是我在这里躲避的。 Thanks. 谢谢。

The use of +-| 使用+-| in your character class [^;=+-|] is a range specification. 在你的角色类中[^;=+-|]是一个范围规范。 This will result in the character class containing (actually excluding since you're using ^ ) much more than you intend. 这将导致包含(实际排除,因为您正在使用^ )的字符类远远超出您的意图。 To specify a literal - in a character class, mention it first like [^-;=+|] . 要指定文字-在字符类中,首先提及它,如[^-;=+|]

The output of PythonRegex is somewhat misleading. PythonRegex的输出有点误导。 The results of r.groups() and r.findall() are both the same: u'void Dump' , which is the content of the first capturing group. r.groups()r.findall()结果都是相同的: u'void Dump' ,这是第一个捕获组的内容。 If it showed the whole match, you'd see that when remove the $ you're only matching 如果它显示了整个匹配,你会看到当删除$你只是匹配

void Dump(

...not the whole function definition as you intended. ...不是你想要的整个功能定义。 The reason for that (as Greg explained) is a syntax error in your last character class. 原因(正如Greg解释的那样)是你上一个字符类中的语法错误。 You need to escape the hyphen by listing it first ( [^-;=+|] ) or last ( [^;=+|-] ), or by adding a backslash ( [^;=+\\-|] ). 您需要首先列出连字符( [^-;=+|] )或最后( [^;=+|-] ),或者添加反斜杠( [^;=+\\-|] )来转义连字符。

The only way I can see to get PythonRegex to show the whole match is by removing all capturing groups (or converting them to non-capturing). 我能看到让PythonRegex显示整个匹配的唯一方法是删除所有捕获组(或将它们转换为非捕获组)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM