查找C样式注释的正则表达式

Question

I am trying to write a regular expression to find C style headers in Java source files. 我试图编写一个正则表达式以在Java源文件中找到C样式的标头。 At the present time I am experimenting with this with Python. 目前，我正在使用Python进行此实验。

Here is my source code: 这是我的源代码：

import re

text = """/*
       * Copyright blah blah blha blah 
       * blah blah blah blah 
       * 2008 blah blah blah @ org
       */"""

print
print "I guess the program printed the correct thing."

pattern = re.compile("^/.+/$")

print "-----------"
print pattern 

pos = 0
while True:
    match = pattern.search(text, pos)
    if not match:
        break
    s = match.start()
    e = match.end()
    print '   %2d : %2d = "%s"' % (s, e-1, text[s:e])
    pos = e

I am trying to write a simple expression that just looks for anything between a forward slash and another forward slash. 我正在尝试编写一个简单的表达式，该表达式只查找正斜线和另一个正斜线之间的任何内容。 I can make the regular expression more complicated later. 以后我可以使正则表达式更复杂。

Does anyone know where I am going wrong? 有人知道我要去哪里错吗？ I am using a forward slash the dot meta-character, the plus symbol for 1 or more things, and the dollar symbol for the end. 我在正斜杠上使用点元字符，用于1个或多个事物的加号和结束的美元符号。

Answer 1

For starters, you need to specify the DOTALL flag because by default, the . 对于初学者，您需要指定DOTALL标志，因为默认情况下是. character does not match newlines. 字符与换行符不匹配。

Try: 尝试：

pattern = re.compile("^/.+/$", re.DOTALL)

Answer 2

I don't think you should anchor (using '^' and '$') the match. 我认为您不应该锚定比赛（使用“ ^”和“ $”）。

Secondly, I think the regex should be r"/[^/]*/" which matches a (portion of) a string that starts with a slash, followed by zero or more non-slash characters and then terminates with a slash. 其次，我认为正则表达式应该是r"/[^/]*/" ，它匹配以斜杠开头，后跟零个或多个非斜杠字符，然后以斜杠终止的字符串（的一部分）。

To wit: 以机智：

>>> import re                                                                                                                           
>>> text = """foo bar baz                                                                                                     
... /*                                                                                  
...        * Copyright blah blah blha blah                                                                                              
...        * blah blah blah blah                                                                                                        
...        * 2008 blah blah blah @ org                                                                                                  
...        */"""                                                                                                                          
>>> rx = re.compile(r"/[^/]*/", re.DOTALL)                                                                                              
>>> mo = rx.search(text)                                                                                                                
>>> text[mo.start(): mo.end()]                                                                                                          
'/*\n       * Copyright blah blah blha blah \n       * blah blah blah blah \n       * 2008 blah blah blah @ org\n       */'

Note that the comment does not start a the start of the string but that the regex finds it nicely. 注意，注释不是以字符串的开头开始，而是正则表达式很好地找到了它。

查找C样式注释的正则表达式

问题描述

2 个解决方案

解决方案1
2 2015-10-02 21:52:30

解决方案2
2 已采纳 2015-10-02 22:06:11

查找C样式注释的正则表达式

问题描述

2 个解决方案

解决方案1 2 2015-10-02 21:52:30

解决方案2 2 已采纳 2015-10-02 22:06:11

解决方案1
2 2015-10-02 21:52:30

解决方案2
2 已采纳 2015-10-02 22:06:11