简体   繁体   English

查找C样式注释的正则表达式

[英]Regular Expression To Find C Style Comments

I am trying to write a regular expression to find C style headers in Java source files. 我试图编写一个正则表达式以在Java源文件中找到C样式的标头。 At the present time I am experimenting with this with Python. 目前,我正在使用Python进行此实验。

Here is my source code: 这是我的源代码:

import re

text = """/*
       * Copyright blah blah blha blah 
       * blah blah blah blah 
       * 2008 blah blah blah @ org
       */"""

print
print "I guess the program printed the correct thing."

pattern = re.compile("^/.+/$")

print "-----------"
print pattern 

pos = 0
while True:
    match = pattern.search(text, pos)
    if not match:
        break
    s = match.start()
    e = match.end()
    print '   %2d : %2d = "%s"' % (s, e-1, text[s:e])
    pos = e 

I am trying to write a simple expression that just looks for anything between a forward slash and another forward slash. 我正在尝试编写一个简单的表达式,该表达式只查找正斜线和另一个正斜线之间的任何内容。 I can make the regular expression more complicated later. 以后我可以使正则表达式更复杂。

Does anyone know where I am going wrong? 有人知道我要去哪里错吗? I am using a forward slash the dot meta-character, the plus symbol for 1 or more things, and the dollar symbol for the end. 我在正斜杠上使用点元字符,用于1个或多个事物的加号和结束的美元符号。

For starters, you need to specify the DOTALL flag because by default, the . 对于初学者,您需要指定DOTALL标志,因为默认情况下是. character does not match newlines. 字符与换行符不匹配。

Try: 尝试:

pattern = re.compile("^/.+/$", re.DOTALL)

I don't think you should anchor (using '^' and '$') the match. 我认为您不应该锚定比赛(使用“ ^”和“ $”)。

Secondly, I think the regex should be r"/[^/]*/" which matches a (portion of) a string that starts with a slash, followed by zero or more non-slash characters and then terminates with a slash. 其次,我认为正则表达式应该是r"/[^/]*/" ,它匹配以斜杠开头,后跟零个或多个非斜杠字符,然后以斜杠终止的字符串(的一部分)。

To wit: 以机智:

>>> import re                                                                                                                           
>>> text = """foo bar baz                                                                                                     
... /*                                                                                  
...        * Copyright blah blah blha blah                                                                                              
...        * blah blah blah blah                                                                                                        
...        * 2008 blah blah blah @ org                                                                                                  
...        */"""                                                                                                                          
>>> rx = re.compile(r"/[^/]*/", re.DOTALL)                                                                                              
>>> mo = rx.search(text)                                                                                                                
>>> text[mo.start(): mo.end()]                                                                                                          
'/*\n       * Copyright blah blah blha blah \n       * blah blah blah blah \n       * 2008 blah blah blah @ org\n       */'

Note that the comment does not start a the start of the string but that the regex finds it nicely. 注意,注释不是以字符串的开头开始,而是正则表达式很好地找到了它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM